Code

Opened 3 years ago

Closed 21 months ago

Last modified 21 months ago

#15361 closed Cleanup/optimization (fixed)

Document performance considerations when using Queryset.get()

Reported by: mbertheau Owned by: mmcnickle
Component: Documentation Version: 1.2
Severity: Normal Keywords:
Cc: timograham@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently QuerySet.get() counts the number of rows in the result set. For get()'s semantics though, what's important is only whether there were 0, 1 or more rows returned. Counting the whole number of rows is a potentially expensive operation compared to stopping after the first 2 rows. Therefore QuerySet.get() should use LIMIT 2 (and no ORDER BY)

Attachments (4)

15361.diff (3.0 KB) - added by mmcnickle 3 years ago.
Patch with test
15361-2.diff (4.2 KB) - added by mmcnickle 3 years ago.
Documentation as per discussion. Added tests for MultipleObjectsReturned.
15361-2.2.diff (4.6 KB) - added by mmcnickle 3 years ago.
15361-3.diff (3.8 KB) - added by timo 23 months ago.

Download all attachments as: .zip

Change History (23)

comment:1 Changed 3 years ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Sounds reasonable to me.

Changed 3 years ago by mmcnickle

Patch with test

comment:2 Changed 3 years ago by mmcnickle

  • Has patch set

comment:3 Changed 3 years ago by lukeplant

I've done some benchmarks, using djangobench and the query_get benchmark (use my fork https://github.com/spookylukey/djangobench), and this patch actually slows it down for both SQLite and Postgres (I haven't tested MySQL). If you get different results, please let us know so that we can reproduce the performance increase - otherwise we obviously won't commit this.

comment:4 Changed 3 years ago by mbertheau

I question the validity of that benchmark. With 10 rows what you measure is probably the overhead of the slicing in the Django ORM. The original argument for using LIMIT makes still perfect sense.

comment:5 Changed 3 years ago by Alex

I'm not sure it does, the majority of get() calls (in my experience) are on fields such as pk or slug, which have database level uniqueness constraints, and thus the database has no need to be told that there is a limit on the number of results. Indeed the only case where this should make a large difference is if you're doing a get() from which the database will return a large number of results and then become a MultipleObjectsReturned exception, a case which I contend is unlikely.

comment:6 Changed 3 years ago by mbertheau

I concur. It was indeed in the context of a database design flaw that I noticed the unneeded row number counting of get().

However, I don't believe that the negative effects of this patch are so significant that it's worth sacrificing its positive effects, regardless of their low likelihood of occurring in well designed databases.

lukeplant, do the results of the other benchmark tests differ significantly?

comment:7 Changed 3 years ago by mmcnickle

  • Owner changed from nobody to mmcnickle

The concern that this change may make the get() slower may be valid, but one thing is for sure, that benchmark is inadequate for anything but huge performance regressions. I'm able to produce "Significant" slower results by running 2 identical versions of django:

Running all benchmarks
Control: Django 1.3 beta 1 (in django-control)
Experiment: Django 1.3 beta 1 (in django-experiment)

Running 'query_get' benchmark ...
Min: 0.020000 -> 0.020000: no change
Avg: 0.022400 -> 0.024800: 1.1071x slower
Significant (t=-2.556039)
Stddev: 0.00431 -> 0.00505: 1.1698x larger (N = 50)

The telling fact is that these "significant" results are transient, differing from run to run.

I'll write a benchmark that will properly stress the query.get() and see if:

a) There is a performance regression for pk (or other fields with unique constraints) lookups
b) There is a performance increase for non-indexed, large match lookups.

If (b) and not (a), it's a +1.
If (a) and (b), it's probably a -1, given the more usual case in (a)
If (a) and not (b), -1

comment:8 Changed 3 years ago by mmcnickle

Patch with the documentation change as decided in the django-dev post.

I've retained the tests from the old patch, as they increase the test coverage of query.get() and aren't really specific to this issue.

Changed 3 years ago by mmcnickle

Documentation as per discussion. Added tests for MultipleObjectsReturned.

comment:9 Changed 3 years ago by mmcnickle

  • Component changed from Database layer (models, ORM) to Documentation
  • Keywords QuerySet get limit MultipleObjectsReturned removed
  • milestone set to 1.3
  • Summary changed from QuerySet.get() should use LIMIT 2 to Document performance considerations when using Queryset.get()

comment:10 Changed 3 years ago by mmcnickle

Changing title, component so it's more likely to be picked up for 1.3 docs.

comment:11 Changed 3 years ago by gabrielhurley

  • Patch needs improvement set

The patch looks mostly good.

However, the last chunk in the optimization addition isn't quite right. While I understand how that query can potentially be slow, in the sample code that the slowness is trivialized by the fact that it's going to raise MultipleObjectsReturned and (in the context of the example) result in an uncaught exception.

So, while the advice on unique, indexed columns is solid, I'd like to see the contrasting slowness illustrated in a way that would work in a real-world use case. Additionally, a little more explanation of *why* that query could be slow would add a lot for new users.

Changed 3 years ago by mmcnickle

comment:12 Changed 3 years ago by mmcnickle

Added a clearer explanation of why returning multiple objects from get() is a performance issue.

comment:13 Changed 3 years ago by mmcnickle

  • Patch needs improvement unset

comment:14 Changed 3 years ago by lrekucki

  • Severity set to Normal
  • Type set to Cleanup/optimization

comment:15 Changed 3 years ago by aaugustin

  • Easy pickings unset
  • Patch needs improvement set
  • UI/UX unset

I noticed that the tests compare an exception message to "get\(\) returned more than one Article -- it returned 2! Lookup parameters were {'pub_date__month': 7, 'pub_date__year': 2005}". Unless I missed something, the order of the elements in the dictionary is not guaranteed to be stable across platforms. I suggest relaxing this test, for instance testing only "get\(\) returned more than one Article -- it returned 2!".

Also, I struggled with the structure of the paragraph added to the docs: "Firstly", "also", "First of all", "Secondly"... I'm not a native speaker, but many users of Django aren't either :)

comment:16 Changed 3 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

Changed 23 months ago by timo

comment:17 Changed 23 months ago by timo

  • Cc timograham@… added
  • Patch needs improvement unset

Cleaned up doc patch, relaxed tests per Aymeric's suggestion.

comment:18 Changed 21 months ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from new to closed

In feaf9f279a73d87549c17fc7fb36463f1c7367a1:

Fixed #15361 - Documented performance considerations for QuerySet.get()

Thanks mmcnickle for the patch.

comment:19 Changed 21 months ago by Tim Graham <timograham@…>

In ffc649df888ea35d0e6848bf53ee26917fccc27e:

[1.5.X] Fixed #15361 - Documented performance considerations for QuerySet.get()

Thanks mmcnickle for the patch.

Backport of feaf9f279a from master

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.