Opened 3 years ago

Closed 12 months ago

#22209 closed Cleanup/optimization (worksforme)

Django internals call len(queryset) instead of queryset.count()

Reported by: Chris Wilson Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The documentation says:

Note: Don’t use len() on QuerySets if all you want to do is determine the number of records in the set. It’s much more efficient to handle a count at the database level, using SQL’s SELECT COUNT(*), and Django provides a count() method for precisely this reason. See count() below.

But Django does this itself in a few places:

$ grep -r 'len.*queryset' django/
django/forms/models.py:            return len(self.get_queryset())
django/forms/models.py:        return (len(self.queryset) +
django/contrib/admin/actions.py:    if len(queryset) == 1:

Change History (9)

comment:1 Changed 3 years ago by Chris Wilson

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

Unfortunately it seems that calling count() is sometimes more expensive than len(queryset). When I made this change, admin_views.tests.AdminCustomQuerysetTest.test_changelist_view_count_queries started failing because the number of queries increased from 4 to 15!

So is this a wontfix, or can we improve the efficiency of count()?

https://github.com/aptivate/django/tree/ticket_22209

comment:2 Changed 3 years ago by Anssi Kääriäinen

Easy pickings: set
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Calling len is cheap if the queryset is already executed. Adding comments in the places where this is the case and fixing other places is the right fix for this.

Version 0, edited 3 years ago by Anssi Kääriäinen (next)

comment:3 Changed 3 years ago by Chris Wilson

@akaariai how about we make count() check whether the queryset is already executed, and return its length if it is? Then all Django users will get the benefit without worrying about tricky cases like different code paths and passed-in querysets.

comment:4 Changed 3 years ago by Anssi Kääriäinen

Looking at the code this seems to be already the case. So, here using len will execute the query, and later on the results can be reused. Not sure if the results are actually reused in all cases.

Adding comments where results are reused and changing other places to use count seems still like thw best option.

Similar issue is using "if qs:" - this is very inefficient if the qs isn't reused. Not sure if django has any problems from this antipattern.

comment:5 Changed 2 years ago by Tim Graham

Component: UncategorizedDatabase layer (models, ORM)
Easy pickings: unset

comment:6 Changed 2 years ago by Jelena K.

Resolution: fixed
Status: newclosed

comment:7 Changed 2 years ago by Tim Graham

Resolution: fixed
Status: closednew

Please don't close a ticket without comment.

comment:8 Changed 21 months ago by Collin Anderson

Changing the behavior of count() to magically check the length of the cached data if it exists only helps if the queryset is evaluated first. It's still better to use len() rather than count() in this case:

if queryset.count():
    for x in queryset:
        # do something.

I've actually gone through a lot of my code and changed count() to len() in a lot of cases it where it was actually making things slower.
It seems to me it's not worth changing anything here except for maybe changing the warning to be less severe.

Last edited 12 months ago by Tim Graham (previous) (diff)

comment:9 Changed 12 months ago by Tim Graham

Resolution: worksforme
Status: newclosed

The places in the ticket description look okay to me to use len() instead of count().

Note: See TracTickets for help on using tickets.
Back to Top