Opened 3 years ago

Last modified 3 years ago

#33015 new Cleanup/optimization

QuerySet.extra Use Case: filtering on large lists of tuples

Reported by: kris-swann Owned by: nobody
Component: Database layer (models, ORM) Version: 2.2
Severity: Normal Keywords: QuerySet.extra, Documentation
Cc: Keryn Knight, Simon Charette Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hello,

While reading through the documentation for .extra(..) it looked like use cases should be logged as a ticket here -- so here is that.

In an effort to speed up some slow queries that we've been encountering I've been looking into taking advantage of db indexes.

Basically we have run into the same issue as this SO question: https://stackoverflow.com/questions/23991090/django-filter-multiple-columns-with-a-list-of-tuples

We are using Postgres as our backing db.

We have a model that looks like this

class ExampleModel(models.Model):
    val1 = models.TextField()
    val2 = models.TextField()
    # etc...

    class Meta:
        indexes = [
            models.Index(fields=['val1', 'val2'])
        ]

For demonstration purposes, we'll fill it with sample data

sample_data = [
    ExampleModel(val1=str(v1), val2=str(v2))
    for v1 in range(0,1000)
    for v2 in range(0,1000)
]
ExampleModel.objects.bulk_create(sample_data)

Using a or-chained Q object is significantly slower

Note: In practice this can be a list of any arbitrary combinations, but for ease of generation, we'll use an easy to generate list of tuples

items_to_fetch = [
    (i, j)
    for i in range(0,100)
    for j in range(0,100)
]

Using an or-chained Q object:

import time
query = Q()
for v1, v2 in items_to_fetch:
    query |= Q(val1=v1, val2=v2)
start = time.perf_counter()
ExampleModel.objects.filter(query)
end = time.perf_counter()
print(f"{end - start} seconds")

>>> OUTPUT:
>>> 43.73997112699999 seconds

VS.

Using .extra(...)

import time
start = time.perf_counter()
ExampleModel.objects.extra(
    where=["(val1, val2) in %s"],
    params=[tuple(items_to_fetch)]
)
end = time.perf_counter()
print(f"{end - start} seconds")

>>> OUTPUT:
>>> 0.0004218950000449695 seconds

If there are any alternatives worth trying out, that would be really helpful info. I wasn't able to find anything in the docs (although I might have just been looking in the wrong places too).

Change History (7)

comment:1 by kris-swann, 3 years ago

Summary: QuerySet.extra Use Case: use db indexes when filtering on large lists of tuplesQuerySet.extra Use Case: filtering on large lists of tuples

comment:2 by Simon Charette, 3 years ago

I think you might be able to achieve what you're after with the following (off the main branch)

from django.db.models import lookups
from django.db.models.expressions import Func

class ExpressionTuple(Func):
    template = '(%(expressions)s)'

ExampleModel.objects.filter(
    lookups.In(
        ExpressionTuple('val1', 'val2'),
        tuple(items_to_fetch),
    )
)
Last edited 3 years ago by Simon Charette (previous) (diff)

comment:3 by Keryn Knight, 3 years ago

Cc: Keryn Knight added

Having been bitten by this same thing multiple times over the years, I'm interested.

The performance profile of this is reproducible for me on the 2.2.9 and 3.2.6 tags:

In [5]: %%timeit
   ...: query = Q()
   ...: for v1, v2 in items_to_fetch:
   ...:     query |= Q(val1=v1, val2=v2)
21.9 s ± 1.01 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit -r3 -n1 x = ExampleModel.objects.filter(query)
48.2 s ± 1.54 s per loop (mean ± std. dev. of 3 runs, 1 loop each)

but doesn't appear to be a big issue on main since #32940; my suspicion is that both WhereNode and Q were hitting the pathological case for data in self.children being O(n), in case that rings a bell Simon; below is main as of 8208381ba6; the problem was present at 501a8db465 and basically gone at ff661dbd50:

In [5]: %%timeit
   ...: query = Q()
   ...: for v1, v2 in items_to_fetch:
   ...:     query |= Q(val1=v1, val2=v2)
209 ms ± 1.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit x = ExampleModel.objects.filter(query)
644 ms ± 22.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Though it's still faster to avoid combining Q objects and kwargs:

In [14]: %timeit Q(*(Q(('val1', v1), ('val2', v2)) for v1, v2 in items_to_fetch), _connector=Q.OR)
15.1 ms ± 761 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

That's true for 2.2.9 too, but regrettably has less impact on the query compilation itself:

In [4]: %timeit Q(*(Q(('val1', v1), ('val2', v2)) for v1, v2 in items_to_fetch), _connector=Q.OR)
22.9 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [5]: %timeit -r3 -n1 x = ExampleModel.objects.filter(query)
20.6 s ± 874 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)

I don't know that the simple Q based version will ever be as fast as extra (as used above) or the ExpressionTuple (also above), but it begins to look almost acceptable, certainly compared to the previous;

cProfile output for 3.2.6 (less likely to have changed method names etc than 2.2) I think all but confirms it being the ticket mentioned above:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 50005000   27.676    0.000   74.102    0.000 tree.py:68(__eq__)
 50500000   27.229    0.000   46.430    0.000 lookups.py:151(__eq__)
100980000   15.549    0.000   15.549    0.000 lookups.py:147(identity)
    50002    9.352    0.000   83.492    0.002 tree.py:78(add)
 50740007    3.709    0.000    3.747    0.000 {built-in method builtins.isinstance}
  30001/1    0.266    0.000   85.239   85.239 query.py:1231(build_filter)
  10002/1    0.141    0.000   85.239   85.239 query.py:1401(_add_q)
    40000    0.102    0.000    0.126    0.000 query.py:1474(names_to_path)
    20000    0.068    0.000    0.137    0.000 query.py:1577(setup_joins)

and main as of aforementioned commit:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  30001/1    0.182    0.000    1.286    1.286 query.py:1221(build_filter)
    40000    0.144    0.000    0.159    0.000 query.py:1457(names_to_path)
  10002/1    0.078    0.000    1.286    1.286 query.py:1386(_add_q)
    20000    0.046    0.000    0.380    0.000 query.py:1156(build_lookup)
    20000    0.045    0.000    0.179    0.000 query.py:1560(setup_joins)
   220007    0.040    0.000    0.066    0.000 {built-in method builtins.isinstance}
    20000    0.037    0.000    0.090    0.000 query.py:1093(solve_lookup_type)
    20000    0.036    0.000    0.054    0.000 query_utils.py:146(get_lookup)
   280005    0.036    0.000    0.036    0.000 {built-in method builtins.hasattr}
    20000    0.028    0.000    0.119    0.000 local.py:114(__getattr__)
    30002    0.026    0.000    0.041    0.000 tree.py:79(add)
Last edited 3 years ago by Keryn Knight (previous) (diff)

comment:4 by Carlton Gibson, 3 years ago

Keywords: Documentation added
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

comment:5 by Simon Charette, 3 years ago

Cc: Simon Charette added

Thanks for the benchmarking Keryn! That's definitely data in self.children biting us again.

With the current state of things in terms on Q performance and the possibility of passing lookups to filter in 4.0 (#27021) I wonder what's left to do here. What were you thinking of documenting Carlton?

comment:6 by Carlton Gibson, 3 years ago

What were you thinking of documenting Carlton?

It struck that there's some good comments in this thread, that might go well <somewhere, need to search where>.

Opening comment:

I wasn't able to find anything in the docs (although I might have just been looking in the wrong places too).

e.g. Can we help folks avoid extra better? (But I also thought Keryn's example was perhaps something we could put somewhere 🤔

But:

I wonder what's left to do here

Yes, I wondered that too, but didn't want to just close it without thinking about the potential docs change.

Happy to close if we decide that it's not actionable.

comment:7 by Keryn Knight, 3 years ago

So, I'm not against documenting the most efficient way to generate a big old Q (it's usually an OR IME); I suggested it as an aside in #32946. Mariusz would prefer not to promote using _connector (see comment 1 in that ticket), and though the reasons aren't given, I think I kind of get them, and it boils down to: ideally, we shouldn't need to...

In an ideal world, throwing Q objects around even at a large scale should be fast enough to be usable and not need ... tricks. But even with attempts to improve other bits around that (#32948 ... doesn't make a huge amount of difference here, as most time is in add, avoiding the cost of using kwargs because they're being artificially inflated for Q etc), there's an unfortunate inflexion point where it becomes bigger than most request/response time budgets before you even run the query. The query given above is a perfect example of that, where the query may take ~2ms (on sqlite locally for me) but the evaluation into a tuple might take many times more than that (100ms for 30, 30 items_to_fetch otherwise sqlite doesn't like the expression length being more than 1000) ... and for a change of pace that's not being spent in Model.__init__ or any of the usual suspects (cloning methods). It's entirely in the Query building (if you copy the .query once it's been built and paste it into a new QuerySet, the whole thing is 34ms). As mentioned, I have seen this personally in production (luckily in an out of band celery task, where time mattered less) and that's how I learnt to mitigate it. I don't like it and I don't like that it's something others might have to learn, but even cutting 50% off the costs of build_filter and handling Q objects leaves you probably searching for additional optimisations to make it feasible in a normal request/response cycle (at the given 100, 100 example)

I don't have a good answer for it, really. Documenting that "Yeah it's just not good at huge condition lists" doesn't seem great, but nor does leaving users without an escape hatch for if they encounter such things. If Simon's suggested alternative via the new functionality is well documented, this use case could be used as a good example of that new hotness (though, caveat, I tried running it on main and got AttributeError: 'F' object has no attribute '_output_field_or_none' ... as I know nothing about that functionality (it's a neat surprise), I dunno if that's because it was an untested sketch or if it should work but doesn't).

Note: See TracTickets for help on using tickets.
Back to Top