Opened 3 years ago
Last modified 3 months ago
#33015 new Cleanup/optimization
QuerySet.extra Use Case: filtering on large lists of tuples
Reported by: | kris-swann | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 2.2 |
Severity: | Normal | Keywords: | QuerySet.extra, Documentation |
Cc: | Keryn Knight, Simon Charette | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hello,
While reading through the documentation for .extra(..) it looked like use cases should be logged as a ticket here -- so here is that.
In an effort to speed up some slow queries that we've been encountering I've been looking into taking advantage of db indexes.
Basically we have run into the same issue as this SO question: https://stackoverflow.com/questions/23991090/django-filter-multiple-columns-with-a-list-of-tuples
We are using Postgres as our backing db.
We have a model that looks like this
class ExampleModel(models.Model): val1 = models.TextField() val2 = models.TextField() # etc... class Meta: indexes = [ models.Index(fields=['val1', 'val2']) ]
For demonstration purposes, we'll fill it with sample data
sample_data = [ ExampleModel(val1=str(v1), val2=str(v2)) for v1 in range(0,1000) for v2 in range(0,1000) ] ExampleModel.objects.bulk_create(sample_data)
Using a or-chained Q object is significantly slower
Note: In practice this can be a list of any arbitrary combinations, but for ease of generation, we'll use an easy to generate list of tuples
items_to_fetch = [ (i, j) for i in range(0,100) for j in range(0,100) ]
Using an or-chained Q object:
import time query = Q() for v1, v2 in items_to_fetch: query |= Q(val1=v1, val2=v2) start = time.perf_counter() ExampleModel.objects.filter(query) end = time.perf_counter() print(f"{end - start} seconds") >>> OUTPUT: >>> 43.73997112699999 seconds
VS.
Using .extra(...)
import time start = time.perf_counter() ExampleModel.objects.extra( where=["(val1, val2) in %s"], params=[tuple(items_to_fetch)] ) end = time.perf_counter() print(f"{end - start} seconds") >>> OUTPUT: >>> 0.0004218950000449695 seconds
If there are any alternatives worth trying out, that would be really helpful info. I wasn't able to find anything in the docs (although I might have just been looking in the wrong places too).
Change History (8)
comment:1 by , 3 years ago
Summary: | QuerySet.extra Use Case: use db indexes when filtering on large lists of tuples → QuerySet.extra Use Case: filtering on large lists of tuples |
---|
comment:3 by , 3 years ago
Cc: | added |
---|
Having been bitten by this same thing multiple times over the years, I'm interested.
The performance profile of this is reproducible for me on the 2.2.9
and 3.2.6
tags:
In [5]: %%timeit ...: query = Q() ...: for v1, v2 in items_to_fetch: ...: query |= Q(val1=v1, val2=v2) 21.9 s ± 1.01 s per loop (mean ± std. dev. of 7 runs, 1 loop each) In [7]: %timeit -r3 -n1 x = ExampleModel.objects.filter(query) 48.2 s ± 1.54 s per loop (mean ± std. dev. of 3 runs, 1 loop each)
but doesn't appear to be a big issue on main
since #32940; my suspicion is that both WhereNode
and Q
were hitting the pathological case for data in self.children
being O(n), in case that rings a bell Simon; below is main as of 8208381ba6
; the problem was present at 501a8db465
and basically gone at ff661dbd50
:
In [5]: %%timeit ...: query = Q() ...: for v1, v2 in items_to_fetch: ...: query |= Q(val1=v1, val2=v2) 209 ms ± 1.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [7]: %timeit x = ExampleModel.objects.filter(query) 644 ms ± 22.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Though it's still faster to avoid combining Q objects and kwargs:
In [14]: %timeit Q(*(Q(('val1', v1), ('val2', v2)) for v1, v2 in items_to_fetch), _connector=Q.OR) 15.1 ms ± 761 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
That's true for 2.2.9
too, but regrettably has less impact on the query compilation itself:
In [4]: %timeit Q(*(Q(('val1', v1), ('val2', v2)) for v1, v2 in items_to_fetch), _connector=Q.OR) 22.9 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) In [5]: %timeit -r3 -n1 x = ExampleModel.objects.filter(query) 20.6 s ± 874 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
I don't know that the simple Q based version will ever be as fast as extra
(as used above) or the ExpressionTuple
(also above), but it begins to look almost acceptable, certainly compared to the previous;
cProfile output for 3.2.6
(less likely to have changed method names etc than 2.2
) I think all but confirms it being the ticket mentioned above:
ncalls tottime percall cumtime percall filename:lineno(function) 50005000 27.676 0.000 74.102 0.000 tree.py:68(__eq__) 50500000 27.229 0.000 46.430 0.000 lookups.py:151(__eq__) 100980000 15.549 0.000 15.549 0.000 lookups.py:147(identity) 50002 9.352 0.000 83.492 0.002 tree.py:78(add) 50740007 3.709 0.000 3.747 0.000 {built-in method builtins.isinstance} 30001/1 0.266 0.000 85.239 85.239 query.py:1231(build_filter) 10002/1 0.141 0.000 85.239 85.239 query.py:1401(_add_q) 40000 0.102 0.000 0.126 0.000 query.py:1474(names_to_path) 20000 0.068 0.000 0.137 0.000 query.py:1577(setup_joins)
and main as of aforementioned commit:
ncalls tottime percall cumtime percall filename:lineno(function) 30001/1 0.182 0.000 1.286 1.286 query.py:1221(build_filter) 40000 0.144 0.000 0.159 0.000 query.py:1457(names_to_path) 10002/1 0.078 0.000 1.286 1.286 query.py:1386(_add_q) 20000 0.046 0.000 0.380 0.000 query.py:1156(build_lookup) 20000 0.045 0.000 0.179 0.000 query.py:1560(setup_joins) 220007 0.040 0.000 0.066 0.000 {built-in method builtins.isinstance} 20000 0.037 0.000 0.090 0.000 query.py:1093(solve_lookup_type) 20000 0.036 0.000 0.054 0.000 query_utils.py:146(get_lookup) 280005 0.036 0.000 0.036 0.000 {built-in method builtins.hasattr} 20000 0.028 0.000 0.119 0.000 local.py:114(__getattr__) 30002 0.026 0.000 0.041 0.000 tree.py:79(add)
comment:4 by , 3 years ago
Keywords: | Documentation added |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
comment:5 by , 3 years ago
Cc: | added |
---|
Thanks for the benchmarking Keryn! That's definitely data in self.children
biting us again.
With the current state of things in terms on Q
performance and the possibility of passing lookups to filter in 4.0 (#27021) I wonder what's left to do here. What were you thinking of documenting Carlton?
comment:6 by , 3 years ago
What were you thinking of documenting Carlton?
It struck that there's some good comments in this thread, that might go well <somewhere, need to search where>.
Opening comment:
I wasn't able to find anything in the docs (although I might have just been looking in the wrong places too).
e.g. Can we help folks avoid extra
better? (But I also thought Keryn's example was perhaps something we could put somewhere 🤔
But:
I wonder what's left to do here
Yes, I wondered that too, but didn't want to just close it without thinking about the potential docs change.
Happy to close if we decide that it's not actionable.
comment:7 by , 3 years ago
So, I'm not against documenting the most efficient way to generate a big old Q (it's usually an OR
IME); I suggested it as an aside in #32946. Mariusz would prefer not to promote using _connector
(see comment 1 in that ticket), and though the reasons aren't given, I think I kind of get them, and it boils down to: ideally, we shouldn't need to...
In an ideal world, throwing Q objects around even at a large scale should be fast enough to be usable and not need ... tricks. But even with attempts to improve other bits around that (#32948 ... doesn't make a huge amount of difference here, as most time is in add
, avoiding the cost of using kwargs because they're being artificially inflated for Q
etc), there's an unfortunate inflexion point where it becomes bigger than most request/response time budgets before you even run the query. The query given above is a perfect example of that, where the query may take ~2ms (on sqlite locally for me) but the evaluation into a tuple might take many times more than that (100ms
for 30, 30
items_to_fetch
otherwise sqlite doesn't like the expression length being more than 1000) ... and for a change of pace that's not being spent in Model.__init__
or any of the usual suspects (cloning methods). It's entirely in the Query
building (if you copy the .query
once it's been built and paste it into a new QuerySet, the whole thing is 34ms
). As mentioned, I have seen this personally in production (luckily in an out of band celery task, where time mattered less) and that's how I learnt to mitigate it. I don't like it and I don't like that it's something others might have to learn, but even cutting 50% off the costs of build_filter
and handling Q objects leaves you probably searching for additional optimisations to make it feasible in a normal request/response cycle (at the given 100, 100
example)
I don't have a good answer for it, really. Documenting that "Yeah it's just not good at huge condition lists" doesn't seem great, but nor does leaving users without an escape hatch for if they encounter such things. If Simon's suggested alternative via the new functionality is well documented, this use case could be used as a good example of that new hotness (though, caveat, I tried running it on main
and got AttributeError: 'F' object has no attribute '_output_field_or_none'
... as I know nothing about that functionality (it's a neat surprise), I dunno if that's because it was an untested sketch or if it should work but doesn't).
comment:8 by , 3 months ago
I believe this might be possible on current main, due to this pull request: https://github.com/django/django/commit/1eac690d25dd49088256954d4046813daa37dc95
I haven't tested it, but I believe
ExampleModel.objects.alias( composite_key=Tuple("val1", "val2") ).filter(composite_key__in=items_to_fetch)
might actually work now and produce the desired SQL.
I think you might be able to achieve what you're after with the following (off the main branch)