Opened 10 years ago
Last modified 2 years ago
#23435 new Cleanup/optimization
GenericForeignKey should be indexed
Reported by: | Aymeric Augustin | Owned by: | nobody |
---|---|---|---|
Component: | contrib.contenttypes | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
(Yes, I know GFKs are evil and should be avoided at all costs. Nonetheless, as long as Django supports them, people will use them.)
Django automatically adds an index on foreign keys, for fast reverse lookups, unless it's explicitly disabled with db_index = False
.
I believe it should also add an index on (content_type, object_id)
on GFK. If we make this the default in a future version of Django, users will just have to run makemigrations
on apps that have a GFK and they'll get the index for free.
The alternative would be to update the documentation to recommend adding the index explicitly.
class Meta: index_together = [('content_type', 'object_id')]
Change History (12)
comment:1 by , 10 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 10 years ago
I haven't thought about the implementation yet.
Also we should make sure not to create a duplicate index for the people who added the right one.
comment:3 by , 10 years ago
I've started looking at this ticket, and first tricky thing I caught (using contrubute_to_class
) is following scenario (taken from model_meta tests):
(in short: is there a simple way to distinguish abstract field from multi-table inherited?)
class AbstractPerson(models.Model): ... # GFK fields content_type_abstract = models.ForeignKey(ContentType, related_name='+') object_id_abstract = models.PositiveIntegerField() content_object_abstract = GenericForeignKey('content_type_abstract', 'object_id_abstract') class Meta: abstract = True
This one is easy, here we need:
index_together = (('object_id_abstract', 'content_object_abstract'),)
class BasePerson(AbstractPerson): ... # GFK fields content_type_base = models.ForeignKey(ContentType, related_name='+') object_id_base = models.PositiveIntegerField() content_object_base = GenericForeignKey('content_type_base', 'object_id_base')
This one is easy too, at this point we need:
index_together = ( ('object_id_abstract', 'content_object_abstract'), ('content_type_base', 'object_id_base') )
But then I stuck:
class Person(BasePerson): ... # GFK fields content_type_concrete = models.ForeignKey(ContentType, related_name='+') object_id_concrete = models.PositiveIntegerField() content_object_concrete = GenericForeignKey('content_type_concrete', 'object_id_concrete')
At this point in contribute_to_class
we are going to add two more fields to index from Person
, this is OK. But we have also inherited from BasePerson
+ abstract from AbstractPerson
. All those fields belong to BasePerson
, but we need to take from there only abstract ones, so that for Person
we have:
index_together = ( ('object_id_abstract', 'content_object_abstract'), ('content_type_concrete', 'object_id_concrete') )
We can't add to index BasePerson fields since those live in different table. But we still need ones from AbstractPerson.
Probably I'm terribly wrong, but for this case I can't see obvious way to tell which inherited fields came from Abstract model and which from Base one.
ipdb> cls <class 'model_meta.models.Person'> ipdb> [x.name for x in cls._meta.local_fields] ['baseperson_ptr', 'data_inherited', 'fk_inherited', 'data_not_concrete_inherited', 'content_type_concrete', 'object_id_concrete'] ipdb> [x.name for x in cls._meta.fields] ['id', 'data_abstract', 'fk_abstract', 'data_not_concrete_abstract', 'content_type_abstract', 'object_id_abstract', 'data_base', 'fk_base', 'data_not_concrete_base', 'content_type_base', 'object_id_base', 'baseperson_ptr', 'data_inherited', 'fk_inherited', 'data_not_concrete_inherited', 'content_type_concrete', 'object_id_concrete'] ipdb> cls._meta.get_field('object_id_base').model # << this one is from base, we need to omit it <class 'model_meta.models.BasePerson'> ipdb> cls._meta.get_field('object_id_abstract').model # <<< this one is from abstract, we need it <class 'model_meta.models.BasePerson'>
comment:4 by , 10 years ago
The ('object_id_abstract', 'content_object_abstract')
index should only be added to BasePerson
indexes and not Person
ones.
In this case you can solely rely on opts.local_fields
to make sure both ct_field
and fk_field
are local in order to add the index.
If one of the fields is not local (it belongs to a concrete parent) then the index shouldn't be created.
comment:5 by , 10 years ago
Oh, so, this means that Person
in fact has no abstract model columns in its table. In this case that makes sense.
One more thing: it seems that cls._meta.local_fields
is not guaranteed to be fully populated during contribute_to_class
call, is it? From run to run I'm getting different set of fields in local_fields
during contribute_to_class
calls.
However, the order of calls are always consistent, for example, Person
gets contributions from itself (Person.content_object_concrete
), then from BasePerson
and finally from AbstractPerson
.
I need to go deeper.
comment:6 by , 10 years ago
What also interests me, what kind of index do we want? Composite one, which has two columns, or two separate? We automatically get fk_field
indexed, do we need to add, in this case, separate index on ct_field
?
comment:7 by , 10 years ago
Here is my naive implementation for this ticket: https://github.com/alexanderad/django/compare/alexanderad:master...ticket_23435_indexed_gfk
comment:8 by , 9 years ago
Component: | Database layer (models, ORM) → contrib.contenttypes |
---|---|
Summary: | GFK should be indexed → GenericForeignKey should be indexed |
comment:9 by , 8 years ago
I closed #26964 as a duplicate which suggests to document the lack of this index. Until the implementation of that ticket is completed, if someone wants to submit a PR with that documentation it could be accepted.
comment:10 by , 5 years ago
considering the model meta refactor and class-based Index and many ORM improvement in recent years, what is the best possible solution for this? I would to know the recent consensus on this :)
It's indeed a big performance footgun, accepting on this basis.
I vote for adding the index automatically but how would that work in practice? The field would manipulate the model's
Meta
fromcontribute_to_class
?