#17003 closed New feature (fixed)
prefetch_related should support foreign keys/one-to-one
Reported by: | Luke Plant | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | django@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Although it is currently billed as providing support for many-to-many or the 'many' side of foreign key relationships, there is nothing in the name or API which stops prefetch_related providing support for foreign key relationships or one-to-one.
Often, select_related will be a better alternative for these relationships, since they can be done with a database join. However:
- It is possible that in practice it is better to do a separate query and join in Python.
- There are cases where it may be difficult to arrange for the relationships to be included via
select_related()
. For example if you doprefetch_related('many1__single__many2')
, then, although prefetch_related will traverse the 'single' relationship to get to 'many2', it could execute many DB queries when it does so.
An ideal solution to this would allow prefetching of things like GenericForeignKey
, and potentially other similar 3rd party constructs. For example, someone might have a CrossDatabaseForeignKey
, which would not have referential integrity since the related data is in a different database, but provides convenient access on objects, and would benefit from being able to do a prefetch for efficient access. (select_related, being JOIN based, is clearly out of the picture for this use case). This suggests that some API similar to the one created for many-related-managers (i.e. existence of a get_prefetch_query_set()
method) should be used.
The primary difference for singly related objects is that we have to avoid simply fetching the attribute from the instance, as this will cause the descriptor __get__()
to be called, and the query to be done immediately the inefficient way. We can do this by calling getattr
on the class, which will retrieve the descriptor object itself.
For reference, it appears that Ruby on Rails uses something like prefetch_related()
for all eager loading of relationships, and doesn't have an equivalent to select_related()
.
Attachments (4)
Change History (13)
comment:1 by , 13 years ago
Description: | modified (diff) |
---|---|
Triage Stage: | Unreviewed → Accepted |
by , 13 years ago
Attachment: | prefetch_singly_related_objs.diff added |
---|
comment:2 by , 13 years ago
I've added an implementation that appears to work, and addresses all the requirements set out above, and has support for GenericForeignKey
.
The 'API' for finding objects that support prefetching has now increased a bit:
- we first look on descriptor objects themselves for the
get_prefetch_query_set
method. - the return value has expanded a bit:
- to indicate singly related objects, which are a bit treated differently
- to indicate the cache name, so that singly related objects can be set up correctly
- to allow a more powerful method of finding the 'join values', the 'attributes' that are returned can be callables - this is needed by
GenericForeignKey
.
- also, for the singly related case, if the descriptor has the
get_prefetch_query_set()
method, it also must have anis_cached()
method so that we can find out if the objects have already been fetched (e.g. by select_related).
Docs have also been significantly re-written, since prefetch_related would now be an alternative to select_related for FK and one-to-one relationships.
by , 13 years ago
Attachment: | prefetch_singly_related_objs.2.diff added |
---|
Added support for reverse one-to-one, which had been forgotten, other cleanups
comment:3 by , 13 years ago
Cc: | added |
---|
comment:4 by , 13 years ago
Perfect. Thanks! I've tested the patch with some actions from django-activity-stream (see the Action model class: https://github.com/justquick/django-activity-stream/blob/master/actstream/models.py). The Action model defines three GenericForeignKeys
and allows two of them to be NULL. So do some of the instances currently in my database and so I managed to run into the glorious "This should never happen" case ;)
The problem is in line 92 of django.contrib.contenttypes.generic, where getattr(obj, ct_attname)
returns Null
. I haven't made a patch because I wasn't sure whether it's better to check for a valid ContentType ID in generic.py or in django.db.models.query, where the callable is evaluated.
For quick debugging, I gave get_content_type
an id of 1, which already reduced the number of queries from 76 to 53. So, aside from the Null check, the patch seems to work fine for GenericForeignKeys
.
by , 13 years ago
Attachment: | prefetch_singly_related_objs.3.diff added |
---|
Fixed the nullable GFK problem
by , 13 years ago
Attachment: | prefetch_singly_related_objs.4.diff added |
---|
cleaned up, fixed some docs
comment:6 by , 13 years ago
Has patch: | set |
---|
There seems to be too much overlap with this and #17000 to do work in parallel. I will try to rebase my work on this patch, so that my extensions are only applicable if this patch is applied too. It would make my life easier If you (lukeplant) have some public distributed version control branch I could fork my work from. Preferably GitHub
... If you don't have one available, it is not a big problem, though.
I would wait for this work to land in core, then do my extensions stuff, but I am afraid of 1.4 feature freeze. When is that? I really would like to get custom queryset handling in, as that is what I need from this feature, and I believe it will prove to be really useful for other users, too.
For the record I have a working but somewhat ugly patch for #17000. It clashes badly with this patch, though. I haven't attached the patch into #17000, as I don't want to give the impression that I am competing with this patch.
I will try to review the patch this weekend. Although it might be I need to take some time off from programming, been doing nothing but programming for the last few days...
comment:7 by , 13 years ago
@akaariai: I don't have a publicly accessible fork I'm afraid - I use Mercurial because I can't stand git's insane UI, but the 'hgsubversion' extension doesn't yet support selectively pushing different things to Subversion, which is a pain.
FWIW, I don't think there is much left to do on this patch.
comment:8 by , 13 years ago
Yes, the patch works well for my test case as mentioned above. For listing 10 Action instances, the query count was reduced from 69 to 37, for 100 instances it dropped from 181 to 98.
Let me know if there are more test cases that I can help with.
Implementation