Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#17003 closed New feature (fixed)

prefetch_related should support foreign keys/one-to-one

Reported by: lukeplant Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: django@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by lukeplant)

Although it is currently billed as providing support for many-to-many or the 'many' side of foreign key relationships, there is nothing in the name or API which stops prefetch_related providing support for foreign key relationships or one-to-one.

Often, select_related will be a better alternative for these relationships, since they can be done with a database join. However:

  1. It is possible that in practice it is better to do a separate query and join in Python.
  2. There are cases where it may be difficult to arrange for the relationships to be included via select_related(). For example if you do prefetch_related('many1__single__many2'), then, although prefetch_related will traverse the 'single' relationship to get to 'many2', it could execute many DB queries when it does so.

An ideal solution to this would allow prefetching of things like GenericForeignKey, and potentially other similar 3rd party constructs. For example, someone might have a CrossDatabaseForeignKey, which would not have referential integrity since the related data is in a different database, but provides convenient access on objects, and would benefit from being able to do a prefetch for efficient access. (select_related, being JOIN based, is clearly out of the picture for this use case). This suggests that some API similar to the one created for many-related-managers (i.e. existence of a get_prefetch_query_set() method) should be used.

The primary difference for singly related objects is that we have to avoid simply fetching the attribute from the instance, as this will cause the descriptor __get__() to be called, and the query to be done immediately the inefficient way. We can do this by calling getattr on the class, which will retrieve the descriptor object itself.

For reference, it appears that Ruby on Rails uses something like prefetch_related() for all eager loading of relationships, and doesn't have an equivalent to select_related().

Attachments (4)

prefetch_singly_related_objs.diff (29.7 KB) - added by lukeplant 4 years ago.
Implementation
prefetch_singly_related_objs.2.diff (31.7 KB) - added by lukeplant 4 years ago.
Added support for reverse one-to-one, which had been forgotten, other cleanups
prefetch_singly_related_objs.3.diff (33.3 KB) - added by lukeplant 4 years ago.
Fixed the nullable GFK problem
prefetch_singly_related_objs.4.diff (35.3 KB) - added by lukeplant 4 years ago.
cleaned up, fixed some docs

Download all attachments as: .zip

Change History (13)

comment:1 Changed 4 years ago by lukeplant

  • Description modified (diff)
  • Triage Stage changed from Unreviewed to Accepted

Changed 4 years ago by lukeplant

Implementation

comment:2 Changed 4 years ago by lukeplant

I've added an implementation that appears to work, and addresses all the requirements set out above, and has support for GenericForeignKey.

The 'API' for finding objects that support prefetching has now increased a bit:

  • we first look on descriptor objects themselves for the get_prefetch_query_set method.
  • the return value has expanded a bit:
    • to indicate singly related objects, which are a bit treated differently
    • to indicate the cache name, so that singly related objects can be set up correctly
    • to allow a more powerful method of finding the 'join values', the 'attributes' that are returned can be callables - this is needed by GenericForeignKey.
  • also, for the singly related case, if the descriptor has the get_prefetch_query_set() method, it also must have an is_cached() method so that we can find out if the objects have already been fetched (e.g. by select_related).

Docs have also been significantly re-written, since prefetch_related would now be an alternative to select_related for FK and one-to-one relationships.

Changed 4 years ago by lukeplant

Added support for reverse one-to-one, which had been forgotten, other cleanups

comment:3 Changed 4 years ago by mkai

  • Cc django@… added

comment:4 Changed 4 years ago by mkai

Perfect. Thanks! I've tested the patch with some actions from django-activity-stream (see the Action model class: https://github.com/justquick/django-activity-stream/blob/master/actstream/models.py). The Action model defines three GenericForeignKeys and allows two of them to be NULL. So do some of the instances currently in my database and so I managed to run into the glorious "This should never happen" case ;)

The problem is in line 92 of django.contrib.contenttypes.generic, where getattr(obj, ct_attname) returns Null. I haven't made a patch because I wasn't sure whether it's better to check for a valid ContentType ID in generic.py or in django.db.models.query, where the callable is evaluated.

For quick debugging, I gave get_content_type an id of 1, which already reduced the number of queries from 76 to 53. So, aside from the Null check, the patch seems to work fine for GenericForeignKeys.

Changed 4 years ago by lukeplant

Fixed the nullable GFK problem

comment:5 Changed 4 years ago by lukeplant

OK, should have fixed that now, please try again.

Changed 4 years ago by lukeplant

cleaned up, fixed some docs

comment:6 Changed 4 years ago by akaariai

  • Has patch set

There seems to be too much overlap with this and #17000 to do work in parallel. I will try to rebase my work on this patch, so that my extensions are only applicable if this patch is applied too. It would make my life easier If you (lukeplant) have some public distributed version control branch I could fork my work from. Preferably GitHub... If you don't have one available, it is not a big problem, though.

I would wait for this work to land in core, then do my extensions stuff, but I am afraid of 1.4 feature freeze. When is that? I really would like to get custom queryset handling in, as that is what I need from this feature, and I believe it will prove to be really useful for other users, too.

For the record I have a working but somewhat ugly patch for #17000. It clashes badly with this patch, though. I haven't attached the patch into #17000, as I don't want to give the impression that I am competing with this patch.

I will try to review the patch this weekend. Although it might be I need to take some time off from programming, been doing nothing but programming for the last few days...

comment:7 Changed 4 years ago by lukeplant

@akaariai: I don't have a publicly accessible fork I'm afraid - I use Mercurial because I can't stand git's insane UI, but the 'hgsubversion' extension doesn't yet support selectively pushing different things to Subversion, which is a pain.

FWIW, I don't think there is much left to do on this patch.

comment:8 Changed 4 years ago by mkai

Yes, the patch works well for my test case as mentioned above. For listing 10 Action instances, the query count was reduced from 69 to 37, for 100 instances it dropped from 181 to 98.

Let me know if there are more test cases that I can help with.

comment:9 Changed 4 years ago by lukeplant

  • Resolution set to fixed
  • Status changed from new to closed

In [16939]:

Fixed #17003 - prefetch_related should support foreign keys/one-to-one

Support for GenericForeignKey is also included.

Note: See TracTickets for help on using tickets.
Back to Top