Opened 3 years ago

Last modified 17 months ago

#23435 new Cleanup/optimization

GenericForeignKey should be indexed

Reported by: Aymeric Augustin Owned by: nobody
Component: contrib.contenttypes Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


(Yes, I know GFKs are evil and should be avoided at all costs. Nonetheless, as long as Django supports them, people will use them.)

Django automatically adds an index on foreign keys, for fast reverse lookups, unless it's explicitly disabled with db_index = False.

I believe it should also add an index on (content_type, object_id) on GFK. If we make this the default in a future version of Django, users will just have to run makemigrations on apps that have a GFK and they'll get the index for free.

The alternative would be to update the documentation to recommend adding the index explicitly.

    class Meta:
        index_together = [('content_type', 'object_id')]

Change History (9)

comment:1 Changed 3 years ago by Loic Bistuer

Triage Stage: UnreviewedAccepted

It's indeed a big performance footgun, accepting on this basis.

I vote for adding the index automatically but how would that work in practice? The field would manipulate the model's Meta from contribute_to_class?

comment:2 Changed 3 years ago by Aymeric Augustin

I haven't thought about the implementation yet.

Also we should make sure not to create a duplicate index for the people who added the right one.

comment:3 Changed 3 years ago by Alexander Shchapov

I've started looking at this ticket, and first tricky thing I caught (using contrubute_to_class) is following scenario (taken from model_meta tests):
(in short: is there a simple way to distinguish abstract field from multi-table inherited?)

class AbstractPerson(models.Model):
    # GFK fields
    content_type_abstract = models.ForeignKey(ContentType, related_name='+')
    object_id_abstract = models.PositiveIntegerField()
    content_object_abstract = GenericForeignKey('content_type_abstract', 'object_id_abstract')

    class Meta:
        abstract = True

This one is easy, here we need:

index_together = (('object_id_abstract', 'content_object_abstract'),)

class BasePerson(AbstractPerson):
    # GFK fields
    content_type_base = models.ForeignKey(ContentType, related_name='+')
    object_id_base = models.PositiveIntegerField()
    content_object_base = GenericForeignKey('content_type_base', 'object_id_base')

This one is easy too, at this point we need:

index_together = (
    ('object_id_abstract', 'content_object_abstract'), 
    ('content_type_base', 'object_id_base')

But then I stuck:

class Person(BasePerson):
    # GFK fields
    content_type_concrete = models.ForeignKey(ContentType, related_name='+')
    object_id_concrete = models.PositiveIntegerField()
    content_object_concrete = GenericForeignKey('content_type_concrete', 'object_id_concrete')

At this point in contribute_to_class we are going to add two more fields to index from Person, this is OK. But we have also inherited from BasePerson + abstract from AbstractPerson. All those fields belong to BasePerson, but we need to take from there only abstract ones, so that for Person we have:

index_together = (
    ('object_id_abstract', 'content_object_abstract'), 
    ('content_type_concrete', 'object_id_concrete')

We can't add to index BasePerson fields since those live in different table. But we still need ones from AbstractPerson.
Probably I'm terribly wrong, but for this case I can't see obvious way to tell which inherited fields came from Abstract model and which from Base one.

ipdb> cls
<class 'model_meta.models.Person'>
ipdb> [ for x in cls._meta.local_fields]
['baseperson_ptr', 'data_inherited', 'fk_inherited', 'data_not_concrete_inherited', 'content_type_concrete', 'object_id_concrete']

ipdb> [ for x in cls._meta.fields]
['id', 'data_abstract', 'fk_abstract', 'data_not_concrete_abstract', 'content_type_abstract', 'object_id_abstract', 'data_base', 'fk_base', 'data_not_concrete_base', 'content_type_base', 'object_id_base', 'baseperson_ptr', 'data_inherited', 'fk_inherited', 'data_not_concrete_inherited', 'content_type_concrete', 'object_id_concrete']

ipdb> cls._meta.get_field('object_id_base').model # << this one is from base, we need to omit it
<class 'model_meta.models.BasePerson'>
ipdb> cls._meta.get_field('object_id_abstract').model # <<< this one is from abstract, we need it
<class 'model_meta.models.BasePerson'>

comment:4 Changed 3 years ago by Simon Charette

The ('object_id_abstract', 'content_object_abstract') index should only be added to BasePerson indexes and not Person ones.

In this case you can solely rely on opts.local_fields to make sure both ct_field and fk_field are local in order to add the index.

If one of the fields is not local (it belongs to a concrete parent) then the index shouldn't be created.

comment:5 Changed 3 years ago by Alexander Shchapov

Oh, so, this means that Person in fact has no abstract model columns in its table. In this case that makes sense.

One more thing: it seems that cls._meta.local_fields is not guaranteed to be fully populated during contribute_to_class call, is it? From run to run I'm getting different set of fields in local_fields during contribute_to_class calls.

However, the order of calls are always consistent, for example, Person gets contributions from itself (Person.content_object_concrete), then from BasePerson and finally from AbstractPerson.

I need to go deeper.

comment:6 Changed 3 years ago by Alexander Shchapov

What also interests me, what kind of index do we want? Composite one, which has two columns, or two separate? We automatically get fk_field indexed, do we need to add, in this case, separate index on ct_field?

comment:7 Changed 3 years ago by Alexander Shchapov

comment:8 Changed 2 years ago by Tim Graham

Component: Database layer (models, ORM)contrib.contenttypes
Summary: GFK should be indexedGenericForeignKey should be indexed

comment:9 Changed 17 months ago by Tim Graham

I closed #26964 as a duplicate which suggests to document the lack of this index. Until the implementation of that ticket is completed, if someone wants to submit a PR with that documentation it could be accepted.

Note: See TracTickets for help on using tickets.
Back to Top