Opened 2 years ago

Last modified 4 weeks ago

#22125 new Cleanup/optimization

Unnecessary creation of index for ManyToManyField

Reported by: tbhtan3@… Owned by: bwreilly
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no


Suppose I have the following model:
class Food(models.Model):

restaurants = models.ManyToManyField(Restaurant)

The following table is created:

CREATE TABLE "main_food_restaurants" (

"id" integer NOT NULL PRIMARY KEY,
"food_id" integer NOT NULL,
"restaurant_id" integer NOT NULL,
UNIQUE ("food_id", "restaurant_id")


and the indexes:

CREATE INDEX "main_food_restaurants_0899c464" ON "main_food_restaurants" ("food_id");
CREATE INDEX "main_food_restaurants_be4c8f84" ON "main_food_restaurants" ("restaurant_id");

Notice that the single index on food_id is not needed due to the unique index (food_id, restaurant_id)

Change History (18)

comment:1 Changed 2 years ago by ceaess

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Is there some of the code missing? I'm not seeing a unique index created in the SQL code you have here, just a unique constraint on food_id and restaurant_id.

comment:2 Changed 2 years ago by anonymous

Let me clarify.
Suppose you have the following code:

class B(models.Model):

class A(models.Model):
    b = models.ManyToManyField(B)

The following SQL is produced:

CREATE TABLE "main_b" (
    "id" integer NOT NULL PRIMARY KEY
CREATE TABLE "main_a_b" (
    "id" integer NOT NULL PRIMARY KEY,
    "a_id" integer NOT NULL,
    "b_id" integer NOT NULL REFERENCES "main_b" ("id"),
    UNIQUE ("a_id", "b_id")
CREATE TABLE "main_a" (
    "id" integer NOT NULL PRIMARY KEY

CREATE INDEX "main_a_b_4d6cc2fb" ON "main_a_b" ("a_id");
CREATE INDEX "main_a_b_2c14050b" ON "main_a_b" ("b_id");

Note that the index on a_id is redundant due to the unique constraint on a_id, b_id

Last edited 4 weeks ago by timgraham (previous) (diff)

comment:3 Changed 2 years ago by bwreilly

  • Owner changed from nobody to bwreilly
  • Status changed from new to assigned

comment:4 Changed 2 years ago by bwreilly

I think this fixes it:

But I am not sure I am entirely happy with it, and perhaps it needs a test. Either way, I'll need to get a CLA and company CLA signed before I can make a pull request.

comment:5 Changed 2 years ago by anonymous

looks good, but I can't comment much because I don't know Django ORM internals.
Thanks for the work!

comment:6 Changed 2 years ago by shai

  • Resolution set to needsinfo
  • Status changed from assigned to closed

The question of whether an index for the first column of a unique constraint is redundant, depends on how the database implements the unique constraint. I am quite certain that the SQL standard does not specify, that unique constraints must be implemented by indexes which can be used partially.

So, for accepting this change, I'd need some indications that the index is actually redundant -- at least in all core backends; hence, needsinfo. Even if we accept it, I'd require the code to allow a (3rd-party) backend to opt-out of it easily -- that is, with no need to override the entire create_model method.

Your code is not a PR, so I can't comment there; I think it will break on unique_together=[('a','b','c')]. It certainly does need tests (does it pass the Django test-suite?) and also documentation changes.

comment:7 Changed 2 years ago by sfrost@…

The SQL standard *doesn't talk about indexes*, so claiming that it doesn't happen to talk about how unique constraints are implemented is a bit like trying to rearrange deck chairs on the Titanic.

The index is redundant in every database that I've ever come across- and any that are sensible will do the same. How can a unique constraint be at all performant if there isn't a fast way to find an existing value? Allowing some backend to "opt-out" is fine, but it certainly shouldn't be the default.

Note that only the index on the *first* column would be redundant; an independent index on the second or subsequent columns which are included in a unique constraint could be quite useful.

comment:8 Changed 2 years ago by bwreilly

I've confirmed PostgreSQL and SQLite both exhibit this behavior (Gist), I'll check MySQL when I get a chance. I don't have access to an Oracle instance, but I'll check their documentation.

Additionally, I've fixed the issue with unique_together=[('a','b','c')], added tests, and moved the functionality to an overridable method (compare). Sorry it isn't a pull request at the moment, I'm going to fix that Monday.

Besides Model Meta options unique_together and Fields db_index, should I update the documentation anywhere else?

comment:9 Changed 2 years ago by Joshua Yanovski

SQL Fiddle may help (I can't link it since it would be marked as spam, but the address is pretty easy to determine). Oracle and MySQL both do this as well.

comment:10 Changed 2 years ago by alucard

For the core DBs, they usually use B-Tree indexes so I think if [a,b] is indexed, then [a] would be a redundant index. To generalize this, any leftmost prefix of an indexed tuple would be redundant. This is definitely true for MySQL.

However, for hash indexes (and possibly other types), both [a] and [a,b] would be perfectly valid and non-redundant.

To quote: "[Hash index] Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.)"

I think we need to have an opt-out on a per table basis (based on Meta options), or even better, per ManyToManyField basis (based on a setting to init)

comment:11 Changed 2 years ago by anonymous

By the way, a very quick way to verify is to actually create a compound index, and then use "EXPLAIN" to check whether a SELECT statement based on the leftmost prefix can make use of the compound index. Of course this is assuming you have access to those core DBs, of which Oracle could be a bit of a problem.

comment:12 Changed 2 years ago by Joshua Yanovski

Again, please look at SQL Fiddle. It allows you to write queries against a database of every type mentioned here, including Oracle (and also Microsoft SQL Server). And every single one of them uses the composite index for a query that uses only the first indexed column.

comment:13 Changed 23 months ago by anonymous

  • Resolution needsinfo deleted
  • Status changed from closed to new

comment:14 Changed 23 months ago by anonymous

  • Triage Stage changed from Unreviewed to Accepted

This seems like a reasonable optimization to me -- at a minimum we can do it by adding a new flag to backends and then only performing it on databases we've validated have this behavior (you'd have to have a REALLY terrible database for index looks not to support a subset of fields with a left-side prefix though).

comment:15 Changed 22 months ago by bwreilly

  • Has patch set

comment:16 Changed 22 months ago by timo

  • Easy pickings unset
  • Version changed from 1.6 to master

comment:17 Changed 18 months ago by akaariai

  • Patch needs improvement set

I don't think the solution in PR2578 is the right solution at all.

If I read the patch correctly it removes all indexes for fields that are already first element in some unique_together index. There are valid cases for asking a separate index for a field even if it is part of an unique_together index. The most important reason for this is that a single-field index is faster than multicolumn index. In addition we shouldn't override what users have asked explicitly.

We should just skip index creation for m2m tables unless the user asks for single column index. I am not sure if we should add flags to m2m field for creation or skipping of these indexes.

comment:18 Changed 18 months ago by timgraham

#23367 was a duplicate.

Note: See TracTickets for help on using tickets.
Back to Top