Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#26194 closed Uncategorized (duplicate)

WHERE EXISTS / WHERE NOT EXISTS clause without using QuerySet.extra

Reported by: john-parton Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords: QuerySet.extra
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I initially posted this to the Django users group here: https://groups.google.com/forum/#!topic/django-users/nBySOT3C27E

Greetings!

I'm trying to refactor a query to avoid using QuerySet.extra (as per the recommendation here: https://docs.djangoproject.com/en/1.9/ref/models/querysets/#extra)

Here's a simplified version of my code:

# testapp.models

class Parent(models.Model):
    pass

class Child(models.Model):
    parent = models.ForeignKey('testapp.Parent')

This is the query I am attempting to replace:

Parent.objects.extra(where=["""
    NOT EXISTS (SELECT 1 FROM testapp_child WHERE testapp_child.parent_id = testapp_parent.id)
"""])

This is extremely similar to, but not the same as

Parent.objects.exclude(id__in=Child.objects.all().values('parent_id'))

Both queries should return the same rows, but the biggest difference is that PostgreSQL's query planner produces completely different plans. The performance difference can be huge, even for a relatively modest data set. (I believe my data set has something like 270k "parent" instances and 80k "child" instances.)

I tried searching and couldn't find a solution to this exact problem.

Thanks for taking the time to read through all of this!

Change History (4)

comment:1 by Shai Berger, 8 years ago

Resolution: invalid
Status: newclosed

Consider using the reverse relation:

Parent.objects.filter(child=None)

It should create a left join, which should perform comparably with the EXISTS query.

Also, please consider using our support channels next time.

comment:2 by Shai Berger, 8 years ago

Oh, sorry, I missed the mention that you did use our support channels; still, the reason for closing this ticket is exactly described in that link.

comment:3 by Simon Charette, 8 years ago

Resolution: invalidduplicate

I'm not sure the query can be considered equivalent given the LEFT JOIN will return duplicate rows for parent with multiple children.

Parent.objects.filter(child=None).distinct() should return the same results but I wouldn't be surprised if it performed worse than the original EXISTS query.

I'll re-close as duplicate of #25789 instead.

comment:4 by john-parton, 8 years ago

Thanks for looking into this.

Implementing a feature like .filter(models.Exists(somerel__col=val)) as mentioned by akaariai in reference to #25789 would fix my issue.

I tried Parent.objects.filter(child=None).distinct() and it resulted in a query that was about 3 times as slow as the NOT EXISTS query. This is definitely better than my attempt with the subquery (which was about 2000 times as slow), but it's still irksome that I can't easily compose the query using the ORM.

I couldn't find #25789 because I was looking for tickets with the "QuerySet.extra" keyword. Can we add that keyword to #25789?

Note: See TracTickets for help on using tickets.
Back to Top