Opened 4 years ago

Closed 4 years ago

#16773 closed Bug (wontfix)

QuerySet.count does no caching until the result cache is filled

Reported by: eternicode Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Before a queryset is evaluated, each QuerySet.count call for a given QuerySet hits the database with a COUNT(*) query, and doesn't cache the result in any way.

In [1]: from django.contrib.auth.models import User

In [2]: from django.db import connections

In [3]: qs = User.objects.all()

In [4]: qs.count()
Out[4]: 3

In [5]: qs.count()
Out[5]: 3

In [6]: list(qs);

In [7]: qs.count()
Out[7]: 3

In [8]: connections['default'].queries
Out[8]: 
[{'sql': 'SELECT COUNT(*) FROM "auth_user"', 'time': '0.005'},
 {'sql': 'SELECT COUNT(*) FROM "auth_user"', 'time': '0.001'},
 {'sql': 'SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" FROM "auth_user"',
  'time': '0.001'}]

Change History (5)

comment:1 Changed 4 years ago by tarequeh

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Currently call to count stops hitting DB once the queryset is cached. If a cache for count is introduced, count will stop hitting DB upon on a call to count. And that may happen before queryset get evaluated/cached. Afterwards when queryset is evaluated, it might have number of results different than the cached count depending on how many objects were created between the call to count and evaluation. So codes like this won't be valid anymore:

books = Book.objects.all()
print books.count() # say prints 5
Book.objects.create(title='Game of Thrones')
print books.count() # currently prints 6 but with cache will print 5

But if the queryset is fully cached, then calling .count doesn't invoke further count calls:

books = Book.objects.all()
print books.count() # say prints 5
print [x.__dict__ for x in books]
Book.objects.create(title='Game of Thrones')
print books.count() # prints 5

So caching count is probably not a good idea.

comment:2 Changed 4 years ago by tarequeh

  • Triage Stage changed from Unreviewed to Design decision needed

comment:3 Changed 4 years ago by eternicode

  • Triage Stage changed from Design decision needed to Unreviewed

tarequeh, good points. I knew there were cases like these that I just wasn't thinking of.

Seems the create/delete could be solved with post_save and post_delete signal handlers, respectively, that clear the count cache. That's probably overkill, though, and maybe even approaching the realm of custom code for a specific purpose.

comment:4 Changed 4 years ago by eternicode

  • Triage Stage changed from Unreviewed to Design decision needed

comment:5 Changed 4 years ago by carljm

  • Resolution set to wontfix
  • Status changed from new to closed

Yes, the comments here are correct. Just caching counts will result in incorrect numbers, and adding post_save and post_delete handlers to core just to address this is swatting a gnat with a sledgehammer (and won't entirely fix the problem, either - bulk deletes and updates don't send those signals).

If you're really calling .count() so many times on the same unevaluated queryset that this is an issue, it's easy to cache the count in your own code. Or you can create your own queryset subclass that caches counts, and deal with the problems this causes.

Note: See TracTickets for help on using tickets.
Back to Top