Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#16902 closed Cleanup/optimization (fixed)

select_related() results in a poor perfomance

Reported by: Ivan Virabyan Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: select_related, get_cached_row, perfomance
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider this code:

s = time()
list(Comment.objects.all()[:500])
list(User.objects.all()[:500])
print 'separate queries', time() - s

s = time()
list(Comment.objects.select_related('author')[:500])
print 'select_related', time() - s

The result is surprising:

separate queries 0.126932859421
select_related 0.276528120041

As you can see, using select_related makes things two times slower. And it is not a query time, query time is just a few milliseconds. So I dived into implementation of get_cached_row, and found that everything is recalculated there for each row, even though most of the information may be calculated only once (outside the loop).

So I've made a patch, and after that version of query with select_related had nearly the same performance as with separate queries.

Attachments (1)

query.py.diff (12.8 KB ) - added by Ivan Virabyan 13 years ago.

Download all attachments as: .zip

Change History (7)

by Ivan Virabyan, 13 years ago

Attachment: query.py.diff added

comment:1 by Carl Meyer, 13 years ago

Triage Stage: UnreviewedAccepted

Accepting because I can duplicate the timing results and inspection of query.py confirms that there's a lot of work repeated unnecessarily for every result row. Don't have time at the moment to test or review the patch. Thanks for the report!

comment:2 by Ivan Virabyan, 13 years ago

Is patch going to be reviewed? Isn't performance a serious issue?

comment:3 by Luke Plant, 13 years ago

I did review it, though not detailed, and it looks sane. I cannot reproduce the performance improvement though. I tried using a modified djangobench (https://github.com/spookylukey/djangobench).

comment:4 by Ivan Virabyan, 13 years ago

You have an error in djangobench/benchmarks/query_select_related/fixtures/initial_data.json
So, fixtures are not loaded (I assume django fails silently on that). You can see it if you try

Book.objects.all()[0]

inside of the benchmark. IndexError will be raised.
I filled initial_data.json with the right data, and now benchmark does work properly:

Running benchmarks: query_select_related
Control: Django 1.3 SVN-16926 (in django-control)
Experiment: Django 1.4 pre-alpha SVN-16926 (in django-experiment)

Running 'query_select_related' benchmark ...
Min: 0.196191 -> 0.088349: 2.2206x faster
Avg: 0.199001 -> 0.090405: 2.2012x faster

Significant (t=120.254008)
Stddev: 0.00497 -> 0.00400: 1.2426x smaller (N = 50)

Here is the fixed fork:
https://github.com/ivirabyan/djangobench/

comment:5 by Luke Plant, 13 years ago

Triage Stage: AcceptedReady for checkin

Thanks for fixing that up - pesky silent errors!

comment:6 by Luke Plant, 13 years ago

Resolution: fixed
Status: newclosed

In [16929]:

Fixed #16902 - select_related() results in a poor perfomance

Thanks to ivan_virabyan for the great patch!

(For the record, some very small tweaks were made by me).

Note: See TracTickets for help on using tickets.
Back to Top