Opened 15 years ago

Closed 15 years ago

Last modified 13 years ago

#11402 closed Uncategorized (fixed)

exists() method on QuerySets

Reported by: Alex Gaynor Owned by: nobody
Component: Database layer (models, ORM) Version: 1.0
Severity: Normal Keywords:
Cc: t.django@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There is a nice little idiom in Django for testing whether any result set corresponds to a QuerySet: qs.extra(select={'a': 1}).values('a').order_by() . This is used in at least 2 places within Django. It's a very nice idiom and it should be a method on a QuerySet because otherwise every time you use it you have to write a comment explaining what the hell you're doing.

Attachments (4)

exists.diff (4.2 KB ) - added by Alex Gaynor 15 years ago.
exists.2.diff (5.4 KB ) - added by Alex Gaynor 15 years ago.
Fixed docs and made has_results() work more like get_count(). This doesn't need tests because it is trivially shown to work because forms and save() continue to work (which are both tested extensively).
exists.3.diff (4.5 KB ) - added by Alex Gaynor 15 years ago.
A version without unrelated admin chnages.
exists.4.diff (4.5 KB ) - added by Alex Gaynor 15 years ago.
Fix a bug jacob spotted.

Download all attachments as: .zip

Change History (16)

comment:1 by Alex Gaynor, 15 years ago

Triage Stage: UnreviewedAccepted

comment:2 by Thomas Güttler, 15 years ago

Cc: hv@… added

comment:3 by anonymous, 15 years ago

Cc: t.django@… added

comment:4 by Vlada Macek, 15 years ago

On duplicate #12021 I proposed the name any() for this method. Submitting it again for consideration.

comment:5 by Luke Plant, 15 years ago

I agree that 'any' is better than 'exists' - it matches the Python builtin.

To answer someone's objection that this should be an optimization in QuerySet.__nonzero__:

__nonzero__ should not do this optimization trick:

  • because it should be consistent with __len__, which simply forces evaluation of the QuerySet
  • because it shouldn't second guess the user.

We did actually have this discussion about QuerySet.__len__(), way back (I can't find it), and with hindsight I'm sure we came to the right conclusion.

Consider the following two bits of code (ignoring bugs for now):

1) Take len() of a queryset, then use its data

options = Options.objects.filter(bar=baz)
choice = None
if len(options) > 1:
    print "Options: " + ", ".join(opt.name for opt in options)
else:
    choice = options[0]

2) Take len() of a queryset, then discard its data

options = Option.objects.filter(bar=baz)
if len(options) > 1:
    print "You've got more than 1!"
else
    print "You've got 1 or less!"


In QuerySet.__len__, it's impossible to guess which the user is doing. So if __len__ does a .count() as an optimization, sometimes it will be a pessimization, causing an extra DB hit compared to just evaluating the query. Exactly the same case can be made for __nonzero__.

Explicit is better than implicit. We provide QuerySet.count() if the user knows that they only want the count, and QuerySet.any() if the user knows that they only want that. If __len__ and __nonzero__ tried to be 'clever', then implementing snippet 1 in an efficient way gets harder - you have to wrap with list(). And snippet 1 is exactly the way that many templates will be written:

  {% if basketitems %}
     You've got {{ basketitems|length }} item(s) in your basket:
     {% for item in basketitems %}
       ...
     {% endfor %}
  {% endif %}

If we made __nonzero__ do the any() automatically, and similarly __len__, it would be very hard to avoid having 3 DB hits from within the above template. But the other way around, it's easy to get optimal behaviour without having paid any attention to performance. And template authors should not have to worry about this, but view authors should, and we should give them the tools to be able to do it explicitly and simply.

Using .count() and .any() isn't so 'pure', but the point is that our abstraction is not perfect (because we are worrying about optimisation), and we should manage the leak as best we can. The best way is simply to add this documentation:

bool() and len() and iter() force evaluation of the QuerySet; use .any() or .count() if you know you don't need all the data.

by Alex Gaynor, 15 years ago

Attachment: exists.diff added

comment:6 by Luke Plant, 15 years ago

Hi Alex,

Good work! My comments:

  • Could you be persuaded to use 'any()' instead of 'exists()'? It's a bit of a bikeshed issue, and you're the one painting it, but I do think that
    if Foo.objects.filter(blah).any():
    
    has a slightly more obvious meaning and reads better than
    if Foo.objects.filter(blah).exists():
    
    especially as you are running into the singular/plural issue ("exists" sounds appropriate for querying the existence of a single item, but is confusing if there might be many). Also, any is evocative of any, complete with the implied efficient behaviour.
  • Obviously this needs tests (for both outcomes). Either in modeltests/lookup/models.py or modeltests/basic/models.py, probably the former.
  • In the docs, I would say "This will perform the check in the most efficient way possible", rather than "This will perform the query in the simplest way possible". The latter implies first that the same query will be performed, or that a simplified query will be performed, both of which are slightly misleading.
  • Query.has_results() needs a docstring, and it should mention that it destructively updates the query object. In fact, putting the clone() call into has_results() would be better AFAICS - this would make it work like get_count(), reducing unwanted surprises should it be used from somewhere else.

by Alex Gaynor, 15 years ago

Attachment: exists.2.diff added

Fixed docs and made has_results() work more like get_count(). This doesn't need tests because it is trivially shown to work because forms and save() continue to work (which are both tested extensively).

by Alex Gaynor, 15 years ago

Attachment: exists.3.diff added

A version without unrelated admin chnages.

by Alex Gaynor, 15 years ago

Attachment: exists.4.diff added

Fix a bug jacob spotted.

comment:7 by Alex Gaynor, 15 years ago

Jacob says he likes the name exists().

comment:8 by Jacob, 15 years ago

Resolution: fixed
Status: newclosed

(In [11646]) Fixed #11402: added a QuerySet.exists() method. Thanks, Alex Gaynor.

comment:9 by Jacob, 15 years ago

I decided on exists() for a couple reasons:

  • While I like the any/all symmetry, the fact that they return different types (qs/bool) is weird and not at all the same as Python's any/all.
  • I think if qs.exists(): ... "scans" better than if qs.any(): ...


Neither big deals, but I thought I'd record my paint-choosing thought processes for posterity.

comment:10 by Luke Plant, 15 years ago

(In [11647]) Fixed a bug in r11646 - refs #11402

The one line of code not covered by a test... ;-)

comment:11 by Nate, 15 years ago

There's a typo in the latest diff, db/models/manager.py line 177: get_query_ste should be get_query_set.

comment:12 by Thomas Güttler, 13 years ago

Cc: hv@… removed
Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset
Note: See TracTickets for help on using tickets.
Back to Top