Opened 10 years ago

Last modified 8 days ago

#22125 new Cleanup/optimization

Unnecessary creation of index for ManyToManyField

Reported by: tbhtan3@… Owned by: Akash Kumar Sen
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: db-indexes
Cc: emorley@…, Phil Krylov, Dan LaManna Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

Suppose I have the following model:

class Food(models.Model):
  restaurants = models.ManyToManyField(Restaurant)

The following table is created:

CREATE TABLE "main_food_restaurants" (
    "id" integer NOT NULL PRIMARY KEY,
    "food_id" integer NOT NULL,
    "restaurant_id" integer NOT NULL,
    UNIQUE ("food_id", "restaurant_id")
)

and the indexes:

CREATE INDEX "main_food_restaurants_0899c464" ON "main_food_restaurants" ("food_id");
CREATE INDEX "main_food_restaurants_be4c8f84" ON "main_food_restaurants" ("restaurant_id");

Notice that the single index on food_id is not needed due to the unique index (food_id, restaurant_id).

Change History (33)

comment:1 by Cea Stapleton, 10 years ago

Is there some of the code missing? I'm not seeing a unique index created in the SQL code you have here, just a unique constraint on food_id and restaurant_id.

comment:2 by anonymous, 10 years ago

Let me clarify.
Suppose you have the following code:

class B(models.Model):
    pass

class A(models.Model):
    b = models.ManyToManyField(B)

The following SQL is produced:

CREATE TABLE "main_b" (
    "id" integer NOT NULL PRIMARY KEY
)
;
CREATE TABLE "main_a_b" (
    "id" integer NOT NULL PRIMARY KEY,
    "a_id" integer NOT NULL,
    "b_id" integer NOT NULL REFERENCES "main_b" ("id"),
    UNIQUE ("a_id", "b_id")
)
;
CREATE TABLE "main_a" (
    "id" integer NOT NULL PRIMARY KEY
)

CREATE INDEX "main_a_b_4d6cc2fb" ON "main_a_b" ("a_id");
CREATE INDEX "main_a_b_2c14050b" ON "main_a_b" ("b_id");

Note that the index on a_id is redundant due to the unique constraint on a_id, b_id

Last edited 8 years ago by Tim Graham (previous) (diff)

comment:3 by bwreilly, 10 years ago

Owner: changed from nobody to bwreilly
Status: newassigned

comment:4 by bwreilly, 10 years ago

I think this fixes it: https://github.com/OneOcean/django/compare/django:master...master

But I am not sure I am entirely happy with it, and perhaps it needs a test. Either way, I'll need to get a CLA and company CLA signed before I can make a pull request.

comment:5 by anonymous, 10 years ago

looks good, but I can't comment much because I don't know Django ORM internals.
Thanks for the work!

comment:6 by Shai Berger, 10 years ago

Resolution: needsinfo
Status: assignedclosed

The question of whether an index for the first column of a unique constraint is redundant, depends on how the database implements the unique constraint. I am quite certain that the SQL standard does not specify, that unique constraints must be implemented by indexes which can be used partially.

So, for accepting this change, I'd need some indications that the index is actually redundant -- at least in all core backends; hence, needsinfo. Even if we accept it, I'd require the code to allow a (3rd-party) backend to opt-out of it easily -- that is, with no need to override the entire create_model method.

Your code is not a PR, so I can't comment there; I think it will break on unique_together=[('a','b','c')]. It certainly does need tests (does it pass the Django test-suite?) and also documentation changes.

comment:7 by sfrost@…, 10 years ago

The SQL standard *doesn't talk about indexes*, so claiming that it doesn't happen to talk about how unique constraints are implemented is a bit like trying to rearrange deck chairs on the Titanic.

The index is redundant in every database that I've ever come across- and any that are sensible will do the same. How can a unique constraint be at all performant if there isn't a fast way to find an existing value? Allowing some backend to "opt-out" is fine, but it certainly shouldn't be the default.

Note that only the index on the *first* column would be redundant; an independent index on the second or subsequent columns which are included in a unique constraint could be quite useful.

comment:8 by bwreilly, 10 years ago

I've confirmed PostgreSQL and SQLite both exhibit this behavior (Gist), I'll check MySQL when I get a chance. I don't have access to an Oracle instance, but I'll check their documentation.

Additionally, I've fixed the issue with unique_together=[('a','b','c')], added tests, and moved the functionality to an overridable method (compare). Sorry it isn't a pull request at the moment, I'm going to fix that Monday.

Besides Model Meta options unique_together and Fields db_index, should I update the documentation anywhere else?

comment:9 by Joshua Yanovski, 10 years ago

SQL Fiddle may help (I can't link it since it would be marked as spam, but the address is pretty easy to determine). Oracle and MySQL both do this as well.

comment:10 by alucard, 10 years ago

For the core DBs, they usually use B-Tree indexes so I think if [a,b] is indexed, then [a] would be a redundant index. To generalize this, any leftmost prefix of an indexed tuple would be redundant. This is definitely true for MySQL.


However, for hash indexes (and possibly other types), both [a] and [a,b] would be perfectly valid and non-redundant.

Source: https://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
To quote: "[Hash index] Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.)"

I think we need to have an opt-out on a per table basis (based on Meta options), or even better, per ManyToManyField basis (based on a setting to init)

comment:11 by anonymous, 10 years ago

By the way, a very quick way to verify is to actually create a compound index, and then use "EXPLAIN" to check whether a SELECT statement based on the leftmost prefix can make use of the compound index. Of course this is assuming you have access to those core DBs, of which Oracle could be a bit of a problem.

comment:12 by Joshua Yanovski, 10 years ago

Again, please look at SQL Fiddle. It allows you to write queries against a database of every type mentioned here, including Oracle (and also Microsoft SQL Server). And every single one of them uses the composite index for a query that uses only the first indexed column.

comment:13 by anonymous, 10 years ago

Resolution: needsinfo
Status: closednew

comment:14 by anonymous, 10 years ago

Triage Stage: UnreviewedAccepted

This seems like a reasonable optimization to me -- at a minimum we can do it by adding a new flag to backends and then only performing it on databases we've validated have this behavior (you'd have to have a REALLY terrible database for index looks not to support a subset of fields with a left-side prefix though).

comment:15 by bwreilly, 10 years ago

Has patch: set

comment:16 by Tim Graham, 10 years ago

Easy pickings: unset
Version: 1.6master

comment:17 by Anssi Kääriäinen, 10 years ago

Patch needs improvement: set

I don't think the solution in PR2578 is the right solution at all.

If I read the patch correctly it removes all indexes for fields that are already first element in some unique_together index. There are valid cases for asking a separate index for a field even if it is part of an unique_together index. The most important reason for this is that a single-field index is faster than multicolumn index. In addition we shouldn't override what users have asked explicitly.

We should just skip index creation for m2m tables unless the user asks for single column index. I am not sure if we should add flags to m2m field for creation or skipping of these indexes.

comment:18 by Tim Graham, 10 years ago

#23367 was a duplicate.

comment:19 by Tim Graham, 8 years ago

Keywords: db-indexes added

comment:20 by Ed Morley, 7 years ago

Cc: emorley@… added

in reply to:  17 comment:21 by Phil Krylov, 5 years ago

Cc: Phil Krylov added

Replying to Anssi Kääriäinen:

We should just skip index creation for m2m tables unless the user asks for single column index. I am not sure if we should add flags to m2m field for creation or skipping of these indexes.

I would say that if someone needs a single column index in their through table, they can use an explicit through table without db_index=False on that (ForeignKey) column.

comment:23 by Tim Graham, 5 years ago

Patch needs improvement: unset

comment:24 by Tim Graham, 5 years ago

Description: modified (diff)
Patch needs improvement: set

in reply to:  24 comment:25 by László Károlyi, 5 years ago

Replying to Tim Graham:

You might want to reconsider having the extraneously-seeming indexes removed on the ManyToMany relationships. I just did a case study where it turned out that MariaDB uses them.

comment:26 by Mariusz Felisiak, 3 years ago

#32288 was a duplicate.

comment:27 by Mariusz Felisiak, 12 months ago

Owner: bwreilly removed
Status: newassigned

comment:28 by Mariusz Felisiak, 12 months ago

Status: assignednew

comment:29 by Akash Kumar Sen, 9 months ago

Owner: set to Akash Kumar Sen
Patch needs improvement: unset
Status: newassigned

comment:30 by Akash Kumar Sen, 9 months ago

Patch needs improvement: set

comment:31 by Dan LaManna, 10 days ago

Cc: Dan LaManna added

comment:32 by django tester, 8 days ago

Resolution: wontfix
Status: assignedclosed

in reply to:  32 comment:33 by Mariusz Felisiak, 8 days ago

Resolution: wontfix
Status: closednew

Replying to django tester:

Please don't close ticket without any feedback.

Note: See TracTickets for help on using tickets.
Back to Top