Opened 9 years ago

Last modified 20 months ago

#6785 new Cleanup/optimization

QuerySet.get() should only attempt to fetch a limited number of rows

Reported by: Brantley Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: timograham@…, Shai Berger Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

.get() selects and lists every record in the db matching the given filter, when it only needs to select two at most.

Attachments (1)

query-get-patch.diff (1.8 KB) - added by Brantley 9 years ago.
QuerySet.get() nit-pick / optimize

Download all attachments as: .zip

Change History (17)

Changed 9 years ago by Brantley

Attachment: query-get-patch.diff added

QuerySet.get() nit-pick / optimize

comment:1 Changed 9 years ago by Brantley

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

Found a small bug that slipped through the tests, so I fixed it and also updated the tests to test .get() raising MultipleObjectsReturned.

comment:2 Changed 9 years ago by Malcolm Tredinnick

Resolution: wontfix
Status: newclosed

Not worth the extra overhead in this performance critical piece of code. In the case when you're calling get correctly, only a single result will be returned and we need that result. In the error case, it's not harmful that we're selecting a few extra results. Any code that is relying on the error case for speed is broken by design.

comment:3 Changed 3 years ago by wim@…

Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset

Strange though it may seem: would it be possible to explicitly state the number of objects returned and restrain that number to 1, 2, 3, 4, 5, more than 5? In my experience, the number of objects is helpful when debugging.

comment:4 Changed 3 years ago by Tim Graham

Cc: timograham@… added
Component: UncategorizedDatabase layer (models, ORM)
Keywords: qs-rf removed
Patch needs improvement: set
Summary: QuerySet.get(), nit-pick / optimizeQuerySet.get() should only attempt to fetch a limited number of rows
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization
Version: queryset-refactormaster

Seems like there's consensus to re-open this: https://groups.google.com/d/topic/django-developers/PkzS9Wv6hIU/discussion

Pull request (which needs improvement): https://github.com/django/django/pull/1139

comment:5 Changed 3 years ago by Tim Graham

Patch needs improvement: unset
Resolution: wontfix
Status: closednew

Updated pull request based on django-developers discussion: https://github.com/django/django/pull/1320

I'm not sure it needs docs or a mention in the release notes, but happy to add them if necessary.

comment:6 Changed 3 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: newclosed

In da79ccca1d34f427952cce4555e598a700adb8de:

Fixed #6785 -- Made QuerySet.get() fetch a limited number of rows.

Thanks Patryk Zawadzki.

comment:7 Changed 2 years ago by Shai Berger

Cc: Shai Berger added
Resolution: fixed
Status: closednew

As noted in #23061 the current implementation is problematic on Oracle. It also does more work than necessary elsewhere: Slicing a query clones it, which is an expensive operation, and potentially makes the database work harder.

It is better to implement this as hinted by the ticket's summary -- by limiting the fetches, not the select.

comment:8 Changed 2 years ago by Anssi Kääriäinen

Limiting the fetches doesn't work that well on most core databases. If I recall correctly sqlite, postgresql and mysql all do transfer all of the rows of the query when accessing the results (for PostgreSQL I know this is the case). So, while limiting the fetches would result in less model instances being generated, the overhead of generating all the rows in the database and transferring the results would still be there.

To get rid of clone() overhead we could just call query.set_limits() manually.

With this all being said... I guess we are optimizing the wrong case here. Successful .get() (just one row returned) should be optimized, not the error case. I am not sure how much overhead LIMIT (or the nested selects on Oracle) cost.

comment:9 in reply to:  8 Changed 2 years ago by Shai Berger

Replying to akaariai:

Limiting the fetches doesn't work that well on most core databases. [...] [T]he overhead of generating all the rows in the database and transferring the results would still be there.

To get rid of clone() overhead we could just call query.set_limits() manually.

With this all being said... I guess we are optimizing the wrong case here. Successful .get() (just one row returned) should be optimized, not the error case. I am not sure how much overhead LIMIT (or the nested selects on Oracle) cost.

The intent of re-opening is exactly to optimize the successful case; the issue is that the current implementation imposes an overhead on all get() calls for better handling of the error case. Doing it with fetches alone will remove this overhead -- for the successful case, asking for 21 results or for all results would be the same. I am not sure about the added performance costs either, except that (as noted) the nested selects on Oracle cost us in functionality.

On a side note, if more than one row is returned from the database, no model instance at all needs to be created.

comment:10 Changed 2 years ago by Anssi Kääriäinen

Has patch: unset
Patch needs improvement: set

Yes, optimizing the successful case is important. The .get() method can be used in try-except workflows, but I don't know of any realistic use case where the try-except is interested in multiple objects returned case.

The question is how much overhead the LIMIT 21 clause adds for get() on databases that support it. If it doesn't add overhead, then keeping the current behavior on databases that support LIMIT seems OK to me. If it adds overhead, then lets just get rid of the LIMIT clause and close this ticket as wontfix. As said before we can get rid of the clone overhead easily, so that isn't a factor in deciding what to do here.

comment:11 Changed 20 months ago by Jafula

Any chance of revisiting this for 1.8? In Oracle, the get query SQL is wrapped in an additional select and is just noise when debugging (no known performance complaints in Production so far).

We have our own Oracle backend that inherits from the one that ships with Django that we have added Oracle DRCP (connection pooling) support to it.

So some sort of hook, parameter or over-ridable function that would let us turn this feature off in our own backend would be wonderful.

Michael

comment:12 Changed 20 months ago by Tim Graham

Patch needs improvement: unset

Of course, if someone writes a patch we'll look at it.

comment:13 Changed 20 months ago by Anssi Kääriäinen

I'm beginning to think committing this was a mistake. Correctly working code shouldn't face this issue, so why optimize it. Removing this from 1.8 is ok for me.

One possibility is to use the iterator() method, so we don't turn all the rows to model instances.

comment:14 Changed 20 months ago by Tim Graham <timograham@…>

Resolution: fixed
Status: newclosed

In 293fd5da5b8c7b79bd34ef793ab45c1bb8ac69ea:

Reverted "Fixed #6785 -- Made QuerySet.get() fetch a limited number of rows."

This reverts commit da79ccca1d34f427952cce4555e598a700adb8de.

This optimized the unsuccessful case at the expense of the successful one.

comment:15 Changed 20 months ago by Tim Graham <timograham@…>

In 7060ef71581c740bcc28ed405225537a411c36b5:

[1.8.x] Reverted "Fixed #6785 -- Made QuerySet.get() fetch a limited number of rows."

This reverts commit da79ccca1d34f427952cce4555e598a700adb8de.

This optimized the unsuccessful case at the expense of the successful one.

Backport of 293fd5da5b8c7b79bd34ef793ab45c1bb8ac69ea from master

comment:16 Changed 20 months ago by Tim Graham

Resolution: fixed
Status: closednew

I've reverted the original patch.

Note: See TracTickets for help on using tickets.
Back to Top