#16902 closed Cleanup/optimization (fixed)
select_related() results in a poor perfomance
Reported by: | Ivan Virabyan | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | select_related, get_cached_row, perfomance |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Consider this code:
s = time() list(Comment.objects.all()[:500]) list(User.objects.all()[:500]) print 'separate queries', time() - s s = time() list(Comment.objects.select_related('author')[:500]) print 'select_related', time() - s
The result is surprising:
separate queries 0.126932859421 select_related 0.276528120041
As you can see, using select_related makes things two times slower. And it is not a query time, query time is just a few milliseconds. So I dived into implementation of get_cached_row, and found that everything is recalculated there for each row, even though most of the information may be calculated only once (outside the loop).
So I've made a patch, and after that version of query with select_related had nearly the same performance as with separate queries.
Attachments (1)
Change History (7)
by , 13 years ago
Attachment: | query.py.diff added |
---|
comment:1 by , 13 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 13 years ago
I did review it, though not detailed, and it looks sane. I cannot reproduce the performance improvement though. I tried using a modified djangobench (https://github.com/spookylukey/djangobench).
comment:4 by , 13 years ago
You have an error in djangobench/benchmarks/query_select_related/fixtures/initial_data.json
So, fixtures are not loaded (I assume django fails silently on that). You can see it if you try
Book.objects.all()[0]
inside of the benchmark. IndexError will be raised.
I filled initial_data.json with the right data, and now benchmark does work properly:
Running benchmarks: query_select_related Control: Django 1.3 SVN-16926 (in django-control) Experiment: Django 1.4 pre-alpha SVN-16926 (in django-experiment) Running 'query_select_related' benchmark ... Min: 0.196191 -> 0.088349: 2.2206x faster Avg: 0.199001 -> 0.090405: 2.2012x faster Significant (t=120.254008) Stddev: 0.00497 -> 0.00400: 1.2426x smaller (N = 50)
Here is the fixed fork:
https://github.com/ivirabyan/djangobench/
comment:5 by , 13 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Thanks for fixing that up - pesky silent errors!
Accepting because I can duplicate the timing results and inspection of query.py confirms that there's a lot of work repeated unnecessarily for every result row. Don't have time at the moment to test or review the patch. Thanks for the report!