Opened 10 years ago

Last modified 2 years ago

#23435 new Cleanup/optimization

GenericForeignKey should be indexed

Reported by: Aymeric Augustin Owned by: nobody
Component: contrib.contenttypes Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

(Yes, I know GFKs are evil and should be avoided at all costs. Nonetheless, as long as Django supports them, people will use them.)

Django automatically adds an index on foreign keys, for fast reverse lookups, unless it's explicitly disabled with db_index = False.

I believe it should also add an index on (content_type, object_id) on GFK. If we make this the default in a future version of Django, users will just have to run makemigrations on apps that have a GFK and they'll get the index for free.

The alternative would be to update the documentation to recommend adding the index explicitly.

    class Meta:
        index_together = [('content_type', 'object_id')]

Change History (12)

comment:1 by Loic Bistuer, 10 years ago

Triage Stage: UnreviewedAccepted

It's indeed a big performance footgun, accepting on this basis.

I vote for adding the index automatically but how would that work in practice? The field would manipulate the model's Meta from contribute_to_class?

comment:2 by Aymeric Augustin, 10 years ago

I haven't thought about the implementation yet.

Also we should make sure not to create a duplicate index for the people who added the right one.

comment:3 by Alexander Shchapov, 10 years ago

I've started looking at this ticket, and first tricky thing I caught (using contrubute_to_class) is following scenario (taken from model_meta tests):
(in short: is there a simple way to distinguish abstract field from multi-table inherited?)

class AbstractPerson(models.Model):
    ...
    # GFK fields
    content_type_abstract = models.ForeignKey(ContentType, related_name='+')
    object_id_abstract = models.PositiveIntegerField()
    content_object_abstract = GenericForeignKey('content_type_abstract', 'object_id_abstract')

    class Meta:
        abstract = True

This one is easy, here we need:

index_together = (('object_id_abstract', 'content_object_abstract'),)

class BasePerson(AbstractPerson):
    ...
    # GFK fields
    content_type_base = models.ForeignKey(ContentType, related_name='+')
    object_id_base = models.PositiveIntegerField()
    content_object_base = GenericForeignKey('content_type_base', 'object_id_base')

This one is easy too, at this point we need:

index_together = (
    ('object_id_abstract', 'content_object_abstract'), 
    ('content_type_base', 'object_id_base')
)

But then I stuck:

class Person(BasePerson):
    ...
    # GFK fields
    content_type_concrete = models.ForeignKey(ContentType, related_name='+')
    object_id_concrete = models.PositiveIntegerField()
    content_object_concrete = GenericForeignKey('content_type_concrete', 'object_id_concrete')

At this point in contribute_to_class we are going to add two more fields to index from Person, this is OK. But we have also inherited from BasePerson + abstract from AbstractPerson. All those fields belong to BasePerson, but we need to take from there only abstract ones, so that for Person we have:

index_together = (
    ('object_id_abstract', 'content_object_abstract'), 
    ('content_type_concrete', 'object_id_concrete')
)

We can't add to index BasePerson fields since those live in different table. But we still need ones from AbstractPerson.
Probably I'm terribly wrong, but for this case I can't see obvious way to tell which inherited fields came from Abstract model and which from Base one.

ipdb> cls
<class 'model_meta.models.Person'>
ipdb> [x.name for x in cls._meta.local_fields]
['baseperson_ptr', 'data_inherited', 'fk_inherited', 'data_not_concrete_inherited', 'content_type_concrete', 'object_id_concrete']

ipdb> [x.name for x in cls._meta.fields]
['id', 'data_abstract', 'fk_abstract', 'data_not_concrete_abstract', 'content_type_abstract', 'object_id_abstract', 'data_base', 'fk_base', 'data_not_concrete_base', 'content_type_base', 'object_id_base', 'baseperson_ptr', 'data_inherited', 'fk_inherited', 'data_not_concrete_inherited', 'content_type_concrete', 'object_id_concrete']

ipdb> cls._meta.get_field('object_id_base').model # << this one is from base, we need to omit it
<class 'model_meta.models.BasePerson'>
ipdb> cls._meta.get_field('object_id_abstract').model # <<< this one is from abstract, we need it
<class 'model_meta.models.BasePerson'>


comment:4 by Simon Charette, 10 years ago

The ('object_id_abstract', 'content_object_abstract') index should only be added to BasePerson indexes and not Person ones.

In this case you can solely rely on opts.local_fields to make sure both ct_field and fk_field are local in order to add the index.

If one of the fields is not local (it belongs to a concrete parent) then the index shouldn't be created.

comment:5 by Alexander Shchapov, 10 years ago

Oh, so, this means that Person in fact has no abstract model columns in its table. In this case that makes sense.

One more thing: it seems that cls._meta.local_fields is not guaranteed to be fully populated during contribute_to_class call, is it? From run to run I'm getting different set of fields in local_fields during contribute_to_class calls.

However, the order of calls are always consistent, for example, Person gets contributions from itself (Person.content_object_concrete), then from BasePerson and finally from AbstractPerson.

I need to go deeper.

comment:6 by Alexander Shchapov, 10 years ago

What also interests me, what kind of index do we want? Composite one, which has two columns, or two separate? We automatically get fk_field indexed, do we need to add, in this case, separate index on ct_field?

comment:8 by Tim Graham, 9 years ago

Component: Database layer (models, ORM)contrib.contenttypes
Summary: GFK should be indexedGenericForeignKey should be indexed

comment:9 by Tim Graham, 8 years ago

I closed #26964 as a duplicate which suggests to document the lack of this index. Until the implementation of that ticket is completed, if someone wants to submit a PR with that documentation it could be accepted.

comment:10 by Asif Saifuddin Auvi, 5 years ago

considering the model meta refactor and class-based Index and many ORM improvement in recent years, what is the best possible solution for this? I would to know the recent consensus on this :)

comment:11 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 562e3bc:

Refs #23435 -- Added note about GenericForeignKey indexes to docs.

comment:12 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 2308fb58:

[4.0.x] Refs #23435 -- Added note about GenericForeignKey indexes to docs.

Backport of 562e3bc09aa094a2ebbd3890fa233d04daafa8c9 from main

Note: See TracTickets for help on using tickets.
Back to Top