Opened 7 years ago

Last modified 7 months ago

#6785 new Cleanup/optimization

QuerySet.get() should only attempt to fetch a limited number of rows

Reported by: deadwisdom Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: timograham@…, shai Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

.get() selects and lists every record in the db matching the given filter, when it only needs to select two at most.

Attachments (1)

query-get-patch.diff (1.8 KB) - added by deadwisdom 7 years ago.
QuerySet.get() nit-pick / optimize

Download all attachments as: .zip

Change History (17)

Changed 7 years ago by deadwisdom

QuerySet.get() nit-pick / optimize

comment:1 Changed 7 years ago by deadwisdom

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Found a small bug that slipped through the tests, so I fixed it and also updated the tests to test .get() raising MultipleObjectsReturned.

comment:2 Changed 7 years ago by mtredinnick

  • Resolution set to wontfix
  • Status changed from new to closed

Not worth the extra overhead in this performance critical piece of code. In the case when you're calling get correctly, only a single result will be returned and we need that result. In the error case, it's not harmful that we're selecting a few extra results. Any code that is relying on the error case for speed is broken by design.

comment:3 Changed 2 years ago by wim@…

  • Easy pickings unset
  • Severity set to Normal
  • Type set to Uncategorized
  • UI/UX unset

Strange though it may seem: would it be possible to explicitly state the number of objects returned and restrain that number to 1, 2, 3, 4, 5, more than 5? In my experience, the number of objects is helpful when debugging.

comment:4 Changed 2 years ago by timo

  • Cc timograham@… added
  • Component changed from Uncategorized to Database layer (models, ORM)
  • Keywords qs-rf removed
  • Patch needs improvement set
  • Summary changed from QuerySet.get(), nit-pick / optimize to QuerySet.get() should only attempt to fetch a limited number of rows
  • Triage Stage changed from Unreviewed to Accepted
  • Type changed from Uncategorized to Cleanup/optimization
  • Version changed from queryset-refactor to master

Seems like there's consensus to re-open this: https://groups.google.com/d/topic/django-developers/PkzS9Wv6hIU/discussion

Pull request (which needs improvement): https://github.com/django/django/pull/1139

comment:5 Changed 2 years ago by timo

  • Patch needs improvement unset
  • Resolution wontfix deleted
  • Status changed from closed to new

Updated pull request based on django-developers discussion: https://github.com/django/django/pull/1320

I'm not sure it needs docs or a mention in the release notes, but happy to add them if necessary.

comment:6 Changed 2 years ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from new to closed

In da79ccca1d34f427952cce4555e598a700adb8de:

Fixed #6785 -- Made QuerySet.get() fetch a limited number of rows.

Thanks Patryk Zawadzki.

comment:7 Changed 13 months ago by shai

  • Cc shai added
  • Resolution fixed deleted
  • Status changed from closed to new

As noted in #23061 the current implementation is problematic on Oracle. It also does more work than necessary elsewhere: Slicing a query clones it, which is an expensive operation, and potentially makes the database work harder.

It is better to implement this as hinted by the ticket's summary -- by limiting the fetches, not the select.

comment:8 follow-up: Changed 13 months ago by akaariai

Limiting the fetches doesn't work that well on most core databases. If I recall correctly sqlite, postgresql and mysql all do transfer all of the rows of the query when accessing the results (for PostgreSQL I know this is the case). So, while limiting the fetches would result in less model instances being generated, the overhead of generating all the rows in the database and transferring the results would still be there.

To get rid of clone() overhead we could just call query.set_limits() manually.

With this all being said... I guess we are optimizing the wrong case here. Successful .get() (just one row returned) should be optimized, not the error case. I am not sure how much overhead LIMIT (or the nested selects on Oracle) cost.

comment:9 in reply to: ↑ 8 Changed 13 months ago by shai

Replying to akaariai:

Limiting the fetches doesn't work that well on most core databases. [...] [T]he overhead of generating all the rows in the database and transferring the results would still be there.

To get rid of clone() overhead we could just call query.set_limits() manually.

With this all being said... I guess we are optimizing the wrong case here. Successful .get() (just one row returned) should be optimized, not the error case. I am not sure how much overhead LIMIT (or the nested selects on Oracle) cost.

The intent of re-opening is exactly to optimize the successful case; the issue is that the current implementation imposes an overhead on all get() calls for better handling of the error case. Doing it with fetches alone will remove this overhead -- for the successful case, asking for 21 results or for all results would be the same. I am not sure about the added performance costs either, except that (as noted) the nested selects on Oracle cost us in functionality.

On a side note, if more than one row is returned from the database, no model instance at all needs to be created.

comment:10 Changed 13 months ago by akaariai

  • Has patch unset
  • Patch needs improvement set

Yes, optimizing the successful case is important. The .get() method can be used in try-except workflows, but I don't know of any realistic use case where the try-except is interested in multiple objects returned case.

The question is how much overhead the LIMIT 21 clause adds for get() on databases that support it. If it doesn't add overhead, then keeping the current behavior on databases that support LIMIT seems OK to me. If it adds overhead, then lets just get rid of the LIMIT clause and close this ticket as wontfix. As said before we can get rid of the clone overhead easily, so that isn't a factor in deciding what to do here.

comment:11 Changed 7 months ago by Jafula

Any chance of revisiting this for 1.8? In Oracle, the get query SQL is wrapped in an additional select and is just noise when debugging (no known performance complaints in Production so far).

We have our own Oracle backend that inherits from the one that ships with Django that we have added Oracle DRCP (connection pooling) support to it.

So some sort of hook, parameter or over-ridable function that would let us turn this feature off in our own backend would be wonderful.

Michael

comment:12 Changed 7 months ago by timgraham

  • Patch needs improvement unset

Of course, if someone writes a patch we'll look at it.

comment:13 Changed 7 months ago by akaariai

I'm beginning to think committing this was a mistake. Correctly working code shouldn't face this issue, so why optimize it. Removing this from 1.8 is ok for me.

One possibility is to use the iterator() method, so we don't turn all the rows to model instances.

comment:14 Changed 7 months ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 293fd5da5b8c7b79bd34ef793ab45c1bb8ac69ea:

Reverted "Fixed #6785 -- Made QuerySet.get() fetch a limited number of rows."

This reverts commit da79ccca1d34f427952cce4555e598a700adb8de.

This optimized the unsuccessful case at the expense of the successful one.

comment:15 Changed 7 months ago by Tim Graham <timograham@…>

In 7060ef71581c740bcc28ed405225537a411c36b5:

[1.8.x] Reverted "Fixed #6785 -- Made QuerySet.get() fetch a limited number of rows."

This reverts commit da79ccca1d34f427952cce4555e598a700adb8de.

This optimized the unsuccessful case at the expense of the successful one.

Backport of 293fd5da5b8c7b79bd34ef793ab45c1bb8ac69ea from master

comment:16 Changed 7 months ago by timgraham

  • Resolution fixed deleted
  • Status changed from closed to new

I've reverted the original patch.

Note: See TracTickets for help on using tickets.
Back to Top