Opened 6 years ago

Closed 5 years ago

Last modified 3 years ago

#11402 closed Uncategorized (fixed)

exists() method on QuerySets

Reported by: Alex Owned by: nobody
Component: Database layer (models, ORM) Version: 1.0
Severity: Normal Keywords:
Cc: t.django@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There is a nice little idiom in Django for testing whether any result set corresponds to a QuerySet: qs.extra(select={'a': 1}).values('a').order_by() . This is used in at least 2 places within Django. It's a very nice idiom and it should be a method on a QuerySet because otherwise every time you use it you have to write a comment explaining what the hell you're doing.

Attachments (4)

exists.diff (4.2 KB) - added by Alex 5 years ago.
exists.2.diff (5.4 KB) - added by Alex 5 years ago.
Fixed docs and made has_results() work more like get_count(). This doesn't need tests because it is trivially shown to work because forms and save() continue to work (which are both tested extensively).
exists.3.diff (4.5 KB) - added by Alex 5 years ago.
A version without unrelated admin chnages.
exists.4.diff (4.5 KB) - added by Alex 5 years ago.
Fix a bug jacob spotted.

Download all attachments as: .zip

Change History (16)

comment:1 Changed 6 years ago by Alex

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 6 years ago by guettli

  • Cc hv@… added

comment:3 Changed 5 years ago by anonymous

  • Cc t.django@… added

comment:4 Changed 5 years ago by Vlada Macek

On duplicate #12021 I proposed the name any() for this method. Submitting it again for consideration.

comment:5 Changed 5 years ago by lukeplant

I agree that 'any' is better than 'exists' - it matches the Python builtin.

To answer someone's objection that this should be an optimization in QuerySet.__nonzero__:

__nonzero__ should not do this optimization trick:

  • because it should be consistent with __len__, which simply forces evaluation of the QuerySet
  • because it shouldn't second guess the user.

We did actually have this discussion about QuerySet.__len__(), way back (I can't find it), and with hindsight I'm sure we came to the right conclusion.

Consider the following two bits of code (ignoring bugs for now):

1) Take len() of a queryset, then use its data

options = Options.objects.filter(bar=baz)
choice = None
if len(options) > 1:
    print "Options: " + ", ".join(opt.name for opt in options)
else:
    choice = options[0]

2) Take len() of a queryset, then discard its data

options = Option.objects.filter(bar=baz)
if len(options) > 1:
    print "You've got more than 1!"
else
    print "You've got 1 or less!"


In QuerySet.__len__, it's impossible to guess which the user is doing. So if __len__ does a .count() as an optimization, sometimes it will be a pessimization, causing an extra DB hit compared to just evaluating the query. Exactly the same case can be made for __nonzero__.

Explicit is better than implicit. We provide QuerySet.count() if the user knows that they only want the count, and QuerySet.any() if the user knows that they only want that. If __len__ and __nonzero__ tried to be 'clever', then implementing snippet 1 in an efficient way gets harder - you have to wrap with list(). And snippet 1 is exactly the way that many templates will be written:

  {% if basketitems %}
     You've got {{ basketitems|length }} item(s) in your basket:
     {% for item in basketitems %}
       ...
     {% endfor %}
  {% endif %}

If we made __nonzero__ do the any() automatically, and similarly __len__, it would be very hard to avoid having 3 DB hits from within the above template. But the other way around, it's easy to get optimal behaviour without having paid any attention to performance. And template authors should not have to worry about this, but view authors should, and we should give them the tools to be able to do it explicitly and simply.

Using .count() and .any() isn't so 'pure', but the point is that our abstraction is not perfect (because we are worrying about optimisation), and we should manage the leak as best we can. The best way is simply to add this documentation:

bool() and len() and iter() force evaluation of the QuerySet; use .any() or .count() if you know you don't need all the data.

Changed 5 years ago by Alex

comment:6 Changed 5 years ago by lukeplant

Hi Alex,

Good work! My comments:

  • Could you be persuaded to use 'any()' instead of 'exists()'? It's a bit of a bikeshed issue, and you're the one painting it, but I do think that
    if Foo.objects.filter(blah).any():
    
    has a slightly more obvious meaning and reads better than
    if Foo.objects.filter(blah).exists():
    
    especially as you are running into the singular/plural issue ("exists" sounds appropriate for querying the existence of a single item, but is confusing if there might be many). Also, any is evocative of any, complete with the implied efficient behaviour.
  • Obviously this needs tests (for both outcomes). Either in modeltests/lookup/models.py or modeltests/basic/models.py, probably the former.
  • In the docs, I would say "This will perform the check in the most efficient way possible", rather than "This will perform the query in the simplest way possible". The latter implies first that the same query will be performed, or that a simplified query will be performed, both of which are slightly misleading.
  • Query.has_results() needs a docstring, and it should mention that it destructively updates the query object. In fact, putting the clone() call into has_results() would be better AFAICS - this would make it work like get_count(), reducing unwanted surprises should it be used from somewhere else.

Changed 5 years ago by Alex

Fixed docs and made has_results() work more like get_count(). This doesn't need tests because it is trivially shown to work because forms and save() continue to work (which are both tested extensively).

Changed 5 years ago by Alex

A version without unrelated admin chnages.

Changed 5 years ago by Alex

Fix a bug jacob spotted.

comment:7 Changed 5 years ago by Alex

Jacob says he likes the name exists().

comment:8 Changed 5 years ago by jacob

  • Resolution set to fixed
  • Status changed from new to closed

(In [11646]) Fixed #11402: added a QuerySet.exists() method. Thanks, Alex Gaynor.

comment:9 Changed 5 years ago by jacob

I decided on exists() for a couple reasons:

  • While I like the any/all symmetry, the fact that they return different types (qs/bool) is weird and not at all the same as Python's any/all.
  • I think if qs.exists(): ... "scans" better than if qs.any(): ...


Neither big deals, but I thought I'd record my paint-choosing thought processes for posterity.

comment:10 Changed 5 years ago by lukeplant

(In [11647]) Fixed a bug in r11646 - refs #11402

The one line of code not covered by a test... ;-)

comment:11 Changed 5 years ago by Nate

There's a typo in the latest diff, db/models/manager.py line 177: get_query_ste should be get_query_set.

comment:12 Changed 3 years ago by guettli

  • Cc hv@… removed
  • Easy pickings unset
  • Severity set to Normal
  • Type set to Uncategorized
  • UI/UX unset
Note: See TracTickets for help on using tickets.
Back to Top