Code

Opened 8 years ago

Closed 7 years ago

Last modified 5 years ago

#2256 closed defect (fixed)

queryset.count() does not work correctly with sliced querysets

Reported by: eth_anton@… Owned by: adrian
Component: Database layer (models, ORM) Version:
Severity: normal Keywords: queryset count limit
Cc: mtredinnick Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Makes for counter-intuitive behavior when slicing...

From an interactive shell session:

>>> all = NewsItem.objects.all()
>>> all.count()
11L
>>> all[:5].count()
11L
>>> set = NewsItem.objects.all()[:5]
>>> set.count()
11L
>>>

Attachments (4)

queryset_count.patch (1.5 KB) - added by SmileyChris 7 years ago.
queryset_count_doc.patch (560 bytes) - added by SmileyChris 7 years ago.
Documentation if it's needed
queryset_count.2.patch (1.9 KB) - added by SmileyChris 7 years ago.
with tests
queryset_count.3.patch (2.0 KB) - added by SmileyChris 7 years ago.

Download all attachments as: .zip

Change History (23)

comment:1 Changed 8 years ago by ubernostrum

Hm.

We're definitely explicitly dropping the limit and offset if they're present, but there's no comment to explain why. When I get a little time, I'll try the test suite with the offending lines commented out (lines 183 and 184 in django/db/models/query.py).

comment:2 Changed 8 years ago by mtredinnick

We should either be raising an error in these cases or behaving as we are now.

Count() returns a single value and it doesn't make sense to apply limit and offset to a singleton list like that (although it works. A non-zero offset is going to lead to disappointment in the number of results you get back, though). It's almost certainly not what the caller intended and it's not going to work as a way to count how many things were returned after the limit and offset are applied.

LIMIT/OFFSET are applied after the result set is determined (and in this case it consists of one item, so it's pointless).

comment:3 Changed 7 years ago by cheap_hydrocodone

  • Cc "" added
  • Component Database wrapper deleted
  • Keywords "" added
  • Owner changed from adrian to anonymous
  • priority normal deleted
  • Severity normal deleted
  • Type defect deleted

comment:4 Changed 7 years ago by Collin Grady <cgrady@…>

  • Cc "" removed
  • Component set to Database wrapper
  • Keywords queryset count limit added; "" removed
  • Owner changed from anonymous to adrian
  • priority set to normal
  • Severity set to blocker
  • Type set to defect
  • Version set to SVN

fixing spambot edits

comment:5 Changed 7 years ago by Collin Grady <cgrady@…>

  • Severity changed from blocker to normal

woops, fixing invalid fix

comment:6 Changed 7 years ago by SmileyChris

  • Component changed from Database wrapper to Admin interface
  • priority changed from normal to highest
  • Severity changed from normal to blocker
  • Version SVN deleted

Malcom: .count() is used to determine proper page length for the paginator so it may be what the caller intended.

My opinion is that .count() should only return up to the maximum length that the sliced queryset could return.

comment:7 Changed 7 years ago by SmileyChris

  • Component changed from Admin interface to Database wrapper
  • priority changed from highest to normal
  • Severity changed from blocker to normal

oops, refreshed after Collin's changes and reverted his fixes :P

comment:8 Changed 7 years ago by SmileyChris

  • Triage Stage changed from Unreviewed to Design decision needed

Even if we don't change the existing behaviour, existing code should be commented as to why we are doing it the way we are.

comment:9 Changed 7 years ago by Collin Grady <cgrady@…>

On a whim I checked this using raw SQL and a LIMIT didn't affect the output of COUNT, so maybe that's why the code currently drops it - not much point adding it if the db won't respect it...

Though if there was a way to apply the limit first, and tell sql to count those without killing performance, it would definitely be more intuitive.

comment:10 Changed 7 years ago by SmileyChris

Thanks for checking that Collin.

If we do still want count to take into account the LIMIT/OFFSET, then it should be easy enough to do some basic maths to return the correct value, don't you think?

Changed 7 years ago by SmileyChris

Changed 7 years ago by SmileyChris

Documentation if it's needed

comment:11 Changed 7 years ago by SmileyChris

  • Has patch set
  • Triage Stage changed from Design decision needed to Ready for checkin

Malcom's comments aren't making much sense to me - no-one was talking about a singleton list.

It seems pretty clear to me that the limit/offset should be respected for a count, and my patch is sound so I'm marking as ready for checkin.

Changed 7 years ago by SmileyChris

with tests

comment:12 Changed 7 years ago by SmileyChris

  • Summary changed from queryset.count() drops LIMIT/OFFSET information to queryset.count() does not work with sliced querysets

I left the doc patch separate, because I'm not sure we need to explicitly document this change in behaviour in the DB api - the old behaviour was never implied.

comment:13 Changed 7 years ago by SmileyChris

  • Summary changed from queryset.count() does not work with sliced querysets to queryset.count() does not work correctly with sliced querysets

comment:14 Changed 7 years ago by mtredinnick

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

My earlier comment was that count() and limit/offset do not mix at the SQL level. You have worked around that by manually applying those parameters inside the Python code. I think this is counter-intuitive, since it's now no longer SQL-intuitive or Python-intuitive (slicing the single count() result changes the result?! We implemented limit/offset as slices because it was analogous -- you were slicing a sequence of results). But I can see the flip side, too, so I'm not going to veto this.

One problem with the patch is

>>> articles[8:100].count()
-1

(using the same test data setup). You need to make line 224 be something like count = max(0, count - offset). Fix that and add a test and we're good to go, I think.

comment:15 Changed 7 years ago by SmileyChris

Maybe I'm not following, but you can't slice a count() result as it does not return a QuerySet.
It seems pretty Python-intuitive that len(articles[:5]) == articles[:5].count().

Thanks for picking up that bug, I'll fix now.

Changed 7 years ago by SmileyChris

comment:16 Changed 7 years ago by SmileyChris

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

comment:17 Changed 7 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [4488]) #fixed #2256 -- Made count() interact with slicing on QuerySets. Patch from
SmileyChris.

comment:18 Changed 5 years ago by redduck666

  • Cc mtredinnick added

In [1]: from django.contrib.auth.models import User

In [2]: User.objects.all()[:0].count()
Out[2]: 4296

In [3]: len(User.objects.all()[:0])
Out[3]: 0

is this considered normal?

comment:19 Changed 5 years ago by kmtracey

Alex has opened #10202 for this.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.