#34413 closed New feature (duplicate)
Variant of Prefetch but for the earliest/latest related object
Reported by: | Willem Van Onsem | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
A frequently asked feature that seems to be missing is fetch the earlest/latest related object for each item. Indeed, we can for example work with a subquery to fetch *the primary key* of the earliest/latest related object, but not that object itself.
It turns out however that automating this is not that complicated. What we need to do is (automatically) construct a reverse filter, so if we want the latest Comment
of each Post
, we make a queryset that makes a Comment.objects.filter(post_id=OuterRef('pk'))
, then we convert that into a subquery that will, for each Post
fetch the primary key of the latest comment, and slightly alter the logic that is already used to for prefetching to then prefetch all the Comment
s for these primary keys, and add attributes to the corresponding Post
s.
I made a small proof-of-concept that should normally work for (most) cases: it allows one to use an arbitrary queryset and specify an ordering (or use the ordering already in the queryset or by the model). It can also span over multiple layers. It requires to *always* specify a to_attr
, since, especially when spanning over multiple relations, the default to_attr
would require double underscores.
The most ugly part is that we work with an annotation that is then accessible by the user, so it is not somehow hidden. Technically we could remove the attribute, or do something extra in the ORM to prevent exposing this attribute.
I did not yet check what querysets will be problematic. For example a sliced queryset would (very likely) not work, and likely most/all restrictions already in place for a the queryset of a Prefetch
object are still applicable.
A simple demonstration of the PrefetchEarliest
could be:
User.objects.prefetch_related(PrefetchEarliest('groups', to_attr='first_group', Group.objects.order_by('name'))
Attachments (1)
Change History (11)
by , 21 months ago
Attachment: | prefetching_the_earliest_latest_related_object_.patch added |
---|
comment:1 by , 21 months ago
Needs tests: | set |
---|
comment:2 by , 21 months ago
Description: | modified (diff) |
---|
follow-up: 5 comment:3 by , 21 months ago
Resolution: | → duplicate |
---|---|
Status: | assigned → closed |
comment:4 by , 21 months ago
Description: | modified (diff) |
---|
comment:5 by , 21 months ago
Replying to Simon Charette
What would happen if we span over multiple relations, like:
User.objects.prefetch_related( Prefetch("groups__permissions", queryset=Permission.objects.order_by('codename')[:1], to_attr="first_permission") )
comment:6 by , 21 months ago
Description: | modified (diff) |
---|
follow-up: 8 comment:7 by , 21 months ago
Nothing special really, there's one query per depth of prefetching and Prefetch.queryset
is always for the trailing relation of the lookup so something along the lines of
SELECT auth_group.* FROM auth_group JOIN user_groups ON (user_groups.group_id = auth_group.id) WHERE user_groups.user_id IN ($user_ids)
then
SELECT * FROM ( SELECT auth_permission.*, RANK() OVER ( PARTITION BY auth_group_permissions.group_id ORDER BY auth_permission.name ) _prefetch_rank FROM auth_permission JOIN auth_group_permissions ON (auth_permission.id = auth_group_permissions.permission_id) WHERE auth_group_permissions.group_id IN ($groups_ids) ) qualify WHERE _prefetch_rank <= 1
You can confirm it's working as expected by checking out the latest 4.2 beta and reporting any issues you might encounter.
comment:8 by , 21 months ago
Replying to Simon Charette
Well that was what I thought, the semantics differ: the Prefetch
will in this case fetch the earliest Permission
on each Group
prefetched for that user, whereas the PrefetchEarliest
would annotate the earliest Permission
to each User
. So it limits "bandwidth", and annotates at the same layer. The two therefore don't seem duplicates if I understand correctly?
comment:9 by , 21 months ago
Not sure I understand the rationale here, what does it even mean to annotate the earliest Permission to each User since User
relates to Permission
through groups and that's how prefetch_related
operates.
Why not use a Subquery
annotation instead of a prefetch for this particular case?
User.objects.annotate( first_permission_id=Permission.objects.filter( groups__users=OuterRef('pk') ).order_by('name')[:1] )
SELECT user.*, ( SELECT permission.id FROM permission JOIN group_permissions ON (group_permissions.permission_id = permission.id) JOIN user_groups ON (user_groups.group_id = group_permissions.group_id) WHERE user_groups.user_id = user.id ORDER BY permission.name LIMIT 1 ) first_permission FROM user
Isn't this report more about the ability of subquery to return more than one column so the following is possible
SELECT user.*, first_permission* FROM user, ( SELECT permission.* FROM permission JOIN group_permissions ON (group_permissions.permission_id = permission.id) JOIN user_groups ON (user_groups.group_id = group_permissions.group_id) WHERE user_groups.user_id = user.id ORDER BY permission.name LIMIT 1 ) first_permission
comment:10 by , 17 months ago
I would like to do singular prefetches, which seems kinda like what this is asking for.
With regard to:
User.objects.prefetch_related( Prefetch("groups", queryset=Group.objects.order_by('name')[:1], to_attr="first_group") )
The downside of this is "first_group" is actually a 1 element list. I was imagining maybe it could be just the item when there is a single item slice, or failing that a "flat=True" option on Prefetch.
Pretty sure this is a duplicate of #26780 fixed in the soon to be released 4.2.
Since 242499f2dc2bf24a9a5c855690a2e13d3303581a
Prefetch(queryset)
supports slicing through the use of filtering over partitioned rank (RANK() OVER (PARTITION BY <join_field> ORDER BY <order_by>
) so earliest and latest related objects (or any slice of related objects for that matter) can be prefetched.