#11402 closed Uncategorized (fixed)
exists() method on QuerySets
Reported by: | Alex Gaynor | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.0 |
Severity: | Normal | Keywords: | |
Cc: | t.django@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
There is a nice little idiom in Django for testing whether any result set corresponds to a QuerySet: qs.extra(select={'a': 1}).values('a').order_by()
. This is used in at least 2 places within Django. It's a very nice idiom and it should be a method on a QuerySet because otherwise every time you use it you have to write a comment explaining what the hell you're doing.
Attachments (4)
Change History (16)
comment:1 by , 15 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 15 years ago
Cc: | added |
---|
comment:3 by , 15 years ago
Cc: | added |
---|
comment:4 by , 15 years ago
comment:5 by , 15 years ago
I agree that 'any' is better than 'exists' - it matches the Python builtin.
To answer someone's objection that this should be an optimization in QuerySet.__nonzero__
:
__nonzero__
should not do this optimization trick:
- because it should be consistent with
__len__
, which simply forces evaluation of the QuerySet - because it shouldn't second guess the user.
We did actually have this discussion about QuerySet.__len__()
, way back (I can't find it), and with hindsight I'm sure we came to the right conclusion.
Consider the following two bits of code (ignoring bugs for now):
1) Take len() of a queryset, then use its data
options = Options.objects.filter(bar=baz) choice = None if len(options) > 1: print "Options: " + ", ".join(opt.name for opt in options) else: choice = options[0]
2) Take len() of a queryset, then discard its data
options = Option.objects.filter(bar=baz) if len(options) > 1: print "You've got more than 1!" else print "You've got 1 or less!"
In QuerySet.__len__
, it's impossible to guess which the user is doing. So if __len__
does a .count() as an optimization, sometimes it will be a pessimization, causing an extra DB hit compared to just evaluating the query. Exactly the same case can be made for __nonzero__
.
Explicit is better than implicit. We provide QuerySet.count()
if the user knows that they only want the count, and QuerySet.any()
if the user knows that they only want that. If __len__
and __nonzero__
tried to be 'clever', then implementing snippet 1 in an efficient way gets harder - you have to wrap with list()
. And snippet 1 is exactly the way that many templates will be written:
{% if basketitems %} You've got {{ basketitems|length }} item(s) in your basket: {% for item in basketitems %} ... {% endfor %} {% endif %}
If we made __nonzero__
do the any()
automatically, and similarly __len__
, it would be very hard to avoid having 3 DB hits from within the above template. But the other way around, it's easy to get optimal behaviour without having paid any attention to performance. And template authors should not have to worry about this, but view authors should, and we should give them the tools to be able to do it explicitly and simply.
Using .count() and .any() isn't so 'pure', but the point is that our abstraction is not perfect (because we are worrying about optimisation), and we should manage the leak as best we can. The best way is simply to add this documentation:
bool() and len() and iter() force evaluation of the QuerySet; use .any() or .count() if you know you don't need all the data.
by , 15 years ago
Attachment: | exists.diff added |
---|
comment:6 by , 15 years ago
Hi Alex,
Good work! My comments:
- Could you be persuaded to use 'any()' instead of 'exists()'? It's a bit of a bikeshed issue, and you're the one painting it, but I do think that
if Foo.objects.filter(blah).any():
has a slightly more obvious meaning and reads better thanif Foo.objects.filter(blah).exists():
especially as you are running into the singular/plural issue ("exists" sounds appropriate for querying the existence of a single item, but is confusing if there might be many). Also,any
is evocative of any, complete with the implied efficient behaviour.
- Obviously this needs tests (for both outcomes). Either in modeltests/lookup/models.py or modeltests/basic/models.py, probably the former.
- In the docs, I would say "This will perform the check in the most efficient way possible", rather than "This will perform the query in the simplest way possible". The latter implies first that the same query will be performed, or that a simplified query will be performed, both of which are slightly misleading.
Query.has_results()
needs a docstring, and it should mention that it destructively updates the query object. In fact, putting theclone()
call intohas_results()
would be better AFAICS - this would make it work likeget_count()
, reducing unwanted surprises should it be used from somewhere else.
by , 15 years ago
Attachment: | exists.2.diff added |
---|
Fixed docs and made has_results() work more like get_count(). This doesn't need tests because it is trivially shown to work because forms and save() continue to work (which are both tested extensively).
comment:8 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:9 by , 15 years ago
I decided on exists()
for a couple reasons:
- While I like the any/all symmetry, the fact that they return different types (qs/bool) is weird and not at all the same as Python's any/all.
- I think
if qs.exists(): ...
"scans" better thanif qs.any(): ...
Neither big deals, but I thought I'd record my paint-choosing thought processes for posterity.
comment:10 by , 15 years ago
comment:11 by , 15 years ago
There's a typo in the latest diff, db/models/manager.py line 177: get_query_ste should be get_query_set.
comment:12 by , 13 years ago
Cc: | removed |
---|---|
Easy pickings: | unset |
Severity: | → Normal |
Type: | → Uncategorized |
UI/UX: | unset |
On duplicate #12021 I proposed the name any() for this method. Submitting it again for consideration.