Opened 3 years ago

Closed 2 years ago

#17025 closed Cleanup/optimization (fixed)

sql/where.py WhereNode refactoring

Reported by: akaariai Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords: orm cleanup
Cc: jonas-django@…, suor.web@…, lists@…, hv@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

First the good results: running tests is now about 10% faster. This is purely because SQL generation is faster, no test runner changes. I also ran theses 3 tests:

  • 1000 times User.objects.filter(pk=1)
  • 200 times qs = qs.filter(pk=1) That is, chained queryset cloning test
  • 1000 times, in a transaction, fetch an object and save it.

Results against PostgreSQL on localhost:

Patched:

  • 0:00:00.232005
  • 0:00:00.053464
  • 0:00:02.188870

Master:

  • 0:00:00.375182
  • 0:00:02.226373
  • 0:00:03.722031

So, about 30% faster, nearly 50x faster, and about 40% faster. There is a attached tests.txt file which describes the tests, and gives some results.

This ticket is related to the add_q refactoring (#17000). There are 3 different goals here:

  • make the code cleaner and easier to understand
  • make the code more correct
  • make the code faster

So far I think I have managed to simplify the sql/query.py wherenode handling, and sql/where.py also. There is much better separation of where things are done. Due to some problems with aggregates, currently the compiler.py is not any cleaner, possibly even the opposite.

The attached patch is WIP. There might be stale comments, and there is some rambling about various things in comments. I wont have time to work on this any more just now, so I decided to post the patch for testing and benchmarking. It does pass all the test. I believe it can handle queries which the trunk version can't but I haven't tested, so nothing more about those.

At the end some bad news: Constructing SQL for simple queries is still too slow. Here is a profile of fetching 1000 objects by User.objects.get(pk=1).

2.503 CPU seconds
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1001    0.224    0.000    0.243    0.000 {method 'execute' of 'psycopg2._psycopg.cursor' objects}
     2004    0.093    0.000    0.247    0.000 query.py:274(clone)
    22679    0.070    0.000    0.070    0.000 {hasattr}
     6183    0.053    0.000    0.078    0.000 datastructures.py:108(__init__)
30238/22735    0.046    0.000    1.256    0.000 {len}
     1000    0.044    0.000    0.465    0.000 compiler.py:80(as_sql)
     2000    0.043    0.000    1.200    0.001 query.py:230(iterator)
    22323    0.041    0.000    0.041    0.000 {isinstance}
   474/99    0.040    0.000    0.154    0.002 sre_parse.py:385(_parse)
     1000    0.038    0.000    0.096    0.000 compiler.py:279(get_default_columns)
     1000    0.038    0.000    0.327    0.000 util.py:31(execute)
    24678    0.037    0.000    0.037    0.000 {method 'append' of 'list' objects}
     1000    0.036    0.000    0.085    0.000 base.py:279(__init__)
     1000    0.033    0.000    0.137    0.000 where.py:71(make_atom)
     1000    0.032    0.000    0.301    0.000 query.py:1049(add_filter)

Without profiling the time to run the test is about 1 second, while using raw SQL yields 0.2 second timing.

What does this tell us? There is still a lot of room for improvement, but there is no single source of time loss. There are however a lot of small things we can improve somewhat easily. But more of those later.

Now I would like:

  • To get some realword benchmarks / tests. Does it break, is it faster for real world queries?
  • To get some feedback from the Django core developers who understand the ORM. I am willing to work on cleaning up the ORM, but I need some confirmation that the direction I took is acceptable.
  • It is not yet time to review the patch, but please take a look and say what you think. I would recommend to start from sql/query.py add_q and compare it to the trunk version.

Sorry that the patch is so large. The problem is, in the ORM most things are connected to each other, so when you change one part, it easily cascades to more things. I stopped at aggregation refactor for now.

Thank you for your time!

Attachments (2)

wherenode_refactor.diff (68.3 KB) - added by akaariai 3 years ago.
tests.txt (1.6 KB) - added by akaariai 3 years ago.

Download all attachments as: .zip

Change History (19)

Changed 3 years ago by akaariai

Changed 3 years ago by akaariai

comment:1 Changed 3 years ago by jonash

  • Cc jonas-django@… added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 3 years ago by Suor

  • Cc suor.web@… added

Cleaning up sql.Query and its parts and optimizing SQL generation is a kind of crusade, I feel, django needs.
Still your patch seems too hairy, perhaps the only way to make it clean is even more radical approach of thorough refactoring both sql.Query and sql compiler.

Also, when I profiled sql compilation I found that .get_columns() call is way too slow, I didn't find any good solution at a time, but you seem to be pretty familiar with orm internals. Maybe it will help.

comment:3 Changed 3 years ago by akaariai

I tested get_columns on my trivial test setup (I have a simple model with 4 fields). I could speed up SQL creation by 10% by caching the column output list in opts.standard_cols_list. The total time to run objects.filter(pk=1)[0] went down by 4-5%. Not much, but there are a lot of 5% improvements to be made, and they do sum up.

By the way now when I am testing again 3000 objects.filter(pk=1)[0] calls in a loop, it seems half of the time is used by the DB backend. This would mean there isn't that much optimization to do any more. But I did get different results before, I don't know what I did differently.

But I think this is out of scope for this ticket. The patch is too large already. Maybe we should create another ticket for tracking queryset optimization in general. But I am hesitant to create yet another ticket...

I did run the djangobench test suite, here are the results which have significant changes:

Running 'qs_filter_chaining' benchmark ...
Min: 0.004626 -> 0.001022: 4.5271x faster
Avg: 0.005084 -> 0.001153: 4.4096x faster
Significant (t=30.944514)
Stddev: 0.00079 -> 0.00042: 1.8833x smaller (N = 50)

Running 'query_complex_filter' benchmark ...
Min: 0.000328 -> 0.000191: 1.7145x faster
Avg: 0.000395 -> 0.000204: 1.9340x faster
Significant (t=4.031597)
Stddev: 0.00033 -> 0.00005: 7.2516x smaller (N = 50)

Running 'query_count' benchmark ...
Min: 0.000413 -> 0.000329: 1.2551x faster
Avg: 0.000446 -> 0.000339: 1.3132x faster
Significant (t=6.032214)
Stddev: 0.00012 -> 0.00003: 3.4642x smaller (N = 50)

Running 'query_dates' benchmark ...
Min: 0.001161 -> 0.000764: 1.5200x faster
Avg: 0.001308 -> 0.000818: 1.5994x faster
Significant (t=7.072624)
Stddev: 0.00047 -> 0.00014: 3.3679x smaller (N = 50)

Running 'query_distinct' benchmark ...
Min: 0.000675 -> 0.000548: 1.2324x faster
Avg: 0.000745 -> 0.000611: 1.2183x faster
Significant (t=2.221602)
Stddev: 0.00029 -> 0.00031: 1.0577x larger (N = 50)

Running 'query_exclude' benchmark ...
Min: 0.000883 -> 0.000699: 1.2630x faster
Avg: 0.000930 -> 0.000778: 1.1954x faster
Significant (t=2.855263)
Stddev: 0.00013 -> 0.00035: 2.7498x larger (N = 50)

Running 'query_exists' benchmark ...
Min: 0.001934 -> 0.001135: 1.7038x faster
Avg: 0.002091 -> 0.001233: 1.6958x faster
Significant (t=12.925648)
Stddev: 0.00041 -> 0.00022: 1.8597x smaller (N = 50)

Running 'query_filter' benchmark ...
Min: 0.000688 -> 0.000531: 1.2955x faster
Avg: 0.000766 -> 0.000558: 1.3740x faster
Significant (t=4.518940)
Stddev: 0.00032 -> 0.00006: 5.3167x smaller (N = 50)

Running 'query_get' benchmark ...
Min: 0.000987 -> 0.000573: 1.7228x faster
Avg: 0.001047 -> 0.000638: 1.6418x faster
Significant (t=8.444424)
Stddev: 0.00015 -> 0.00031: 2.0529x larger (N = 50)

Running 'query_get_or_create' benchmark ...
Min: 0.002294 -> 0.001471: 1.5595x faster
Avg: 0.002484 -> 0.001670: 1.4876x faster
Significant (t=8.274263)
Stddev: 0.00039 -> 0.00057: 1.4647x larger (N = 50)

Running 'query_in_bulk' benchmark ...
Min: 0.000655 -> 0.000512: 1.2795x faster
Avg: 0.000728 -> 0.000530: 1.3734x faster
Significant (t=4.438138)
Stddev: 0.00031 -> 0.00005: 6.4475x smaller (N = 50)

Running 'query_latest' benchmark ...
Min: 0.000753 -> 0.000499: 1.5088x faster
Avg: 0.000807 -> 0.000571: 1.4132x faster
Significant (t=4.336143)
Stddev: 0.00015 -> 0.00036: 2.4199x larger (N = 50)

Running 'query_order_by' benchmark ...
Min: 0.000725 -> 0.000594: 1.2208x faster
Avg: 0.000797 -> 0.000627: 1.2707x faster
Significant (t=3.686198)
Stddev: 0.00031 -> 0.00009: 3.3719x smaller (N = 50)

Running 'query_update' benchmark ...
Min: 0.000518 -> 0.000382: 1.3564x faster
Avg: 0.000540 -> 0.000413: 1.3075x faster
Significant (t=6.354390)
Stddev: 0.00007 -> 0.00012: 1.6667x larger (N = 50)

Running 'query_values' benchmark ...
Min: 0.000578 -> 0.000472: 1.2247x faster
Avg: 0.000651 -> 0.000502: 1.2977x faster
Significant (t=3.154670)
Stddev: 0.00033 -> 0.00008: 4.1899x smaller (N = 50)

Running 'query_values_list' benchmark ...
Min: 0.000550 -> 0.000445: 1.2363x faster
Avg: 0.000596 -> 0.000499: 1.1932x faster
Significant (t=2.076357)
Stddev: 0.00013 -> 0.00030: 2.3080x larger (N = 50)

I did have some errors. I don't know if it is because of djangobench (it is the ubuntu 11.04 djangobench) or because I broke something. Probably the latter. I don't have time to investigate this just now.

comment:4 Changed 3 years ago by Suor

In my experience, 4 fields is uncommon for any models that actually store data, it's common for any categorizing or helper models.
So testing with a model with 10+ fields would be more useful, it also affects model instance construction and pickling/unpickling time.

Last edited 3 years ago by Suor (previous) (diff)

comment:5 Changed 3 years ago by akaariai

Yeah, true. I don't know anything about pickling / unpickling, but I have tried to make the model initialization faster. For example if you fetch 1000 models from local SQLite DB, about 75% of the time is used in model __init__. As far as I can see, the __init__ is as fast as possible without going to code generation or C coding, and without breaking backwards compatibility.

I actually think I will make an external CFastModel one day. It implements a custom __init__ method but is otherwise just like normal Model. I haven't got a clue how hard that is to do, because I haven't done any Python extensions in C. And that is the reason why I am going to try to do it :)

I will try to test this ticket's patch with models containing more fields. I suspect the more fields, the lesser the effect of this patch. It would also be interesting to test select_related calls, queries with joins and so on. For those interested in this ticket, this is one way to help...

comment:6 Changed 3 years ago by Suor

Under heavy load I get more model instances from cache than from db. So unpickling was a great issue for me.
And I came up with https://gist.github.com/974735 to speed it up and save some space.

Regarding Model.__init__() it looks like it should be optimized together with QuertSet.iterator(). Also, a common case when len(args) == len(self._meta.fields) could be optimized by use of a precached list of field attnames instead of fields_iter (see gist above for attnames() function).

The second optimization that comes to mind is optimizations of signals handling. Now system stores a list of handlers for each event and when a signal is sent it filters that list with isinstance() call to restrict sender. Storing handlers in a proper structure (a dict sender_class -> list of handlers looks nice) will make sending signals faster.

Last edited 3 years ago by Suor (previous) (diff)

comment:7 Changed 3 years ago by akaariai

I tried that attnames approach. Unfortunately I didn't see any difference. When I tried with __dict__[attname] = val, I got around 20% speedup. I think it is the setattr calls that are the problem, and they can not be made faster easily. If I remove signals and manually do self.field1 = args[1], self.field2 = args[2] and so on, I will get the init down to 0.5 seconds, or 50% faster for the 1000 objects from the DB test. Using raw SQL for the same query takes 0.2 seconds, so there is still some overhead, but not nearly as much.

I would imagine 0.5 seconds speed would be possible for a C code / generated code __init__. Should be a fun little project to do that (of course as an external project).

Re signal sending: #16679

This is an interesting topic, but it is out of scope for this ticket.

comment:8 Changed 3 years ago by Suor

Did you try

    obj.__dict__.update(izip(attnames(cls), data))

It's the fastest way to fill model fields I found.

comment:9 Changed 3 years ago by akaariai

About the errors when running the benchmark suite - turns out I was using a dirty version of the code. git checkout -- made the bechmark suite pass. Results are similar, if not better than above.

comment:10 Changed 3 years ago by Mailing List SVR <lists@…>

  • Cc lists@… added

comment:11 Changed 3 years ago by akaariai

Suor: using cached attnames turned out to be a really good idea. Not only did id speed up large resultsets by a couple of percents (5% according to djangobench), but it is really good for qs.raw() when you don't have all the fields in the SQL, or when using qs.only() or qs.defer(). These are now a lot faster. See #17030 for details. I think that is the right place to discuss other ideas for model initialization speedup, too.

comment:12 Changed 3 years ago by guettli

  • Cc hv@… added

comment:13 Changed 3 years ago by aaugustin

  • Triage Stage changed from Unreviewed to Design decision needed

comment:14 Changed 3 years ago by snyderra@…

What revision from "svn co https://code.djangoproject.com/svn/django/trunk/" should the patch be applied to? I tried the patch from ticket 16759 and it resulted in a 30% speed increase. I would like to test this and the other associated patches as well but they do not list the revision number...

comment:15 follow-up: Changed 3 years ago by akaariai

  • Patch needs improvement set

I will see if I have time to update the patch to apply to trunk version. Until that time, I think the patch applies to roughly r16950 (found out by clicking "3 months" link after the patch, and picking some revision around the time the patch was posted).

I also set "patch needs improvement" as clearly the patch isn't applicable as is. The patch needs more work than just making it apply cleanly to trunk (more comments, cleanup and better aggregate handling).

BTW what was your test that resulted in 30% speedup?

comment:16 in reply to: ↑ 15 Changed 3 years ago by snyderra@…

I applied the patch from ticket 16759 to the appropriate revision. did a diff to the 1.3.1 download and merged the changes into the 1.3.1. so the 30% speed increase is during a bulk data insert that is also intensive in lookups for processing of the data. the speed increase is purely due to the reduction of time spent in deepcopy. that method alone was using ~40% of the compute time according to cProfile. after the patch it was reduced to 12%, so almost 30% I will add the patch to 16759 for people that want to use it with the 1.3.1. now the bottle neck is database access as I would expect and not deepcopy.

Replying to akaariai:

I will see if I have time to update the patch to apply to trunk version. Until that time, I think the patch applies to roughly r16950 (found out by clicking "3 months" link after the patch, and picking some revision around the time the patch was posted).

I also set "patch needs improvement" as clearly the patch isn't applicable as is. The patch needs more work than just making it apply cleanly to trunk (more comments, cleanup and better aggregate handling).

BTW what was your test that resulted in 30% speedup?

comment:17 Changed 2 years ago by akaariai

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.
Back to Top