Opened 9 years ago

Closed 9 years ago

#25403 closed New feature (duplicate)

Subqueries: a common case where people get tempted to use .extra()

Reported by: Paolo Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: QuerySet.extra
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

As written in the documentation I submit a use case where the .extra() method seems the only way to do what I need. Given this model:

	from django.db import models
	
	
	class Post(models.Model):
	    text = models.CharField(max_length=255)

	
	class Comment(models.Model):
	    text = models.CharField(max_length=255)
	    needs_attention = models.BooleanField(default=False)
	    created_at = models.DateTimeField(auto_now_add=True)
	    post = models.ForeignKey(Post)
	


I can annotate the list of posts with:

Post.objects.annotate(last_comment_date=Max('comment__created_at'))


to know the last_comment_date, how to annotate to set a value for last_comment_needs_attention?

I mean how can I obtain more informations from the comments while listing the posts?

I see that this is a quite common request as a feature of the Django ORM, the most detailed description of this problem can be found in this nice blog post .

Is there any plan to allow such a way to annotate? Thx

Change History (9)

comment:1 by Tim Graham, 9 years ago

Could you please try to edit the ticket description to describe the query/behavior you are trying to achieve? Otherwise, it seems we have to translate the models you've provided into the the details of the blog post. Thanks.

comment:2 by Simon Charette, 9 years ago

Version: 1.8master

I'm not sure if there's already a ticket tracking this feature request but Anssi would know.

I guess we could provide an API similar to the prefetch_related()/Prefetch() one.

from django.db import models
from django.utils import timezone

class Post(models.Model):
    text = models.CharField(max_length=255)

class Comment(models.Model):
    post = models.ForeignKey(Post, related_name='comments')
    text = models.CharField(max_length=255)
    created_at = models.DateTimeField(default=timezone.now)

Post.objects.select_related(
    latest_comment=Select(
        'comments', queryset=Comments.objects.order_by('-created_at')
    )
)
Last edited 9 years ago by Simon Charette (previous) (diff)

comment:3 by Josh Smeaton, 9 years ago

The prefetch api already lets you get the correct result at the expense of an extra query. Unfortunately, Django doesn't have a good story when it comes to supporting joins to subqueries just yet. Note that the linked blog post also causes an extra query, so the below is comparable (and supported!)

>>> p1 = Post.objects.create(text='Post 1')
>>> p2 = Post.objects.create(text='Post 2')
>>> c1 = Comment.objects.create(text='Comment 1', post=p1)
>>> c2 = Comment.objects.create(text='Comment 2', post=p1)
>>> c3 = Comment.objects.create(text='Comment 3', post=p2)
>>> latest_comment_ids = Comment.objects.values('post_id').annotate(max_id=Max('id')).values('max_id')
>>> latest_comments = Comment.objects.filter(id__in=latest_comment_ids)
>>> posts = Post.objects.prefetch_related(Prefetch('comment_set', queryset=latest_comments, to_attr='latest_comment'))
>>> for post in posts:
...     print(post.text)
...     for comment in post.latest_comment:
...         print(comment.text)
...
Post 1
Comment 2
Post 2
Comment 3

The queries executed will be close to:

SELECT * FROM "scratch_post"

-- and --

SELECT 
"scratch_comment"."id",
"scratch_comment"."text",
"scratch_comment"."needs_attention",
"scratch_comment"."created_at",
"scratch_comment"."post_id" 
FROM "scratch_comment" 
WHERE "scratch_comment"."id" IN (
    SELECT MAX(U0."id") AS "max_id" 
    FROM "scratch_comment" U0 
    GROUP BY U0."post_id"
)

comment:4 by Anssi Kääriäinen, 9 years ago

I am convinced having something like

Post.objects.select_related(
    latest_comment=Select(
        'comments', queryset=Comments.objects.order_by('-created_at')
    )
)

in Django is a great idea.

If we got .filter() to accept expressions, then you could do exactly the same raw SQL query with expressions as the blog mentions. That is, extra(where=['(blog_id, added) IN %s' % (values,)]) == .filter(RawSQL('(blog_id, added) IN %s', (values,)).

Note that the .extra() version used in the blog post uses string formatting. While as used in the blog post the query is safe, you might be vulnerable to SQL injection attacks if you use the same pattern with user editable char fields for example.

comment:5 by Paolo, 9 years ago

Summary: A quite common use of .extra()Subqueries: a common case where people get tempted to use .extra()

in reply to:  1 comment:6 by Paolo, 9 years ago

Replying to timgraham:
Thanks Tim for noting this, does the title make more sense now?

comment:7 by Tim Graham, 9 years ago

#23854 (Add custom SelectRelated classes) seems to be the ticket for the first idea and #25367 (Allow expressions in .filter() calls) for the second idea.

Does that leave anything else as a reason to keep this ticket open?

in reply to:  7 comment:8 by Paolo, 9 years ago

Replying to timgraham:
Tim don't you think that the akaariai example resembles in a cleaner way the two proposals that you mentioned?
I mean, you are right, in someway overlaps but the way he exposed seems making more sense an aligned to whole system.

comment:9 by Tim Graham, 9 years ago

Resolution: duplicate
Status: newclosed

I don't know, I guess it depends what's actually implemented. I'll add a comment on #25403 asking to reopen this ticket if it doesn't address this issue if/when it's completed.

Note: See TracTickets for help on using tickets.
Back to Top