Code

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#16902 closed Cleanup/optimization (fixed)

select_related() results in a poor perfomance

Reported by: ivan_virabyan Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: select_related, get_cached_row, perfomance
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider this code:

s = time()
list(Comment.objects.all()[:500])
list(User.objects.all()[:500])
print 'separate queries', time() - s

s = time()
list(Comment.objects.select_related('author')[:500])
print 'select_related', time() - s

The result is surprising:

separate queries 0.126932859421
select_related 0.276528120041

As you can see, using select_related makes things two times slower. And it is not a query time, query time is just a few milliseconds. So I dived into implementation of get_cached_row, and found that everything is recalculated there for each row, even though most of the information may be calculated only once (outside the loop).

So I've made a patch, and after that version of query with select_related had nearly the same performance as with separate queries.

Attachments (1)

query.py.diff (12.8 KB) - added by ivan_virabyan 3 years ago.

Download all attachments as: .zip

Change History (7)

Changed 3 years ago by ivan_virabyan

comment:1 Changed 3 years ago by carljm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Accepting because I can duplicate the timing results and inspection of query.py confirms that there's a lot of work repeated unnecessarily for every result row. Don't have time at the moment to test or review the patch. Thanks for the report!

comment:2 Changed 3 years ago by ivan_virabyan

Is patch going to be reviewed? Isn't performance a serious issue?

comment:3 Changed 3 years ago by lukeplant

I did review it, though not detailed, and it looks sane. I cannot reproduce the performance improvement though. I tried using a modified djangobench (https://github.com/spookylukey/djangobench).

comment:4 Changed 3 years ago by ivan_virabyan

You have an error in djangobench/benchmarks/query_select_related/fixtures/initial_data.json
So, fixtures are not loaded (I assume django fails silently on that). You can see it if you try

Book.objects.all()[0]

inside of the benchmark. IndexError will be raised.
I filled initial_data.json with the right data, and now benchmark does work properly:

Running benchmarks: query_select_related
Control: Django 1.3 SVN-16926 (in django-control)
Experiment: Django 1.4 pre-alpha SVN-16926 (in django-experiment)

Running 'query_select_related' benchmark ...
Min: 0.196191 -> 0.088349: 2.2206x faster
Avg: 0.199001 -> 0.090405: 2.2012x faster

Significant (t=120.254008)
Stddev: 0.00497 -> 0.00400: 1.2426x smaller (N = 50)

Here is the fixed fork:
https://github.com/ivirabyan/djangobench/

comment:5 Changed 3 years ago by lukeplant

  • Triage Stage changed from Accepted to Ready for checkin

Thanks for fixing that up - pesky silent errors!

comment:6 Changed 3 years ago by lukeplant

  • Resolution set to fixed
  • Status changed from new to closed

In [16929]:

Fixed #16902 - select_related() results in a poor perfomance

Thanks to ivan_virabyan for the great patch!

(For the record, some very small tweaks were made by me).

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.