#27269 closed Uncategorized (needsinfo)
ORM extremely slow to run annotate/filter query, much faster directly in dbshell
Reported by: | Jerome Leclanche | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.10 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I have a table with a few million rows on Postgresql 9.5.4. The model looks like this:
class GlobalGame(Model): digest = CharField(max_length=40) ...
The following query takes several seconds to run (results unaltered - there aren't many):
>>> GlobalGame.objects.exclude(digest=None).order_by().values("digest").annotate(c=Count("id")).filter(c__gt=1, id__lt=100000) <QuerySet [{'c': 2, 'digest': '682c11b036d1a81cf24bc8619922af3ac6d25a25'}, {'c': 2, 'digest': '9bba9dc5df0c0db37f3cc06e2a1a30824c5c6812'}, {'c': 2, 'digest': 'b3dda775e8d9641225c4a113d8145a4817fae37d'}, {'c': 2, 'digest': 'd23e4bfeb1eacc531439e32d4a8d6d330e678f3f'}, {'c': 2, 'digest': 'e215de0646134713cac57edf430acfb60378a816'}]>
Despite that, if I print the .query
object and run *that* directly into psql, I get the results in 2ms. EXPLAIN ANALYZE
below:
""" QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=40396.36..40477.82 rows=6517 width=49) (actual time=2.024..2.158 rows=5 loops=1) Group Key: digest Filter: (count(id) > 1) Rows Removed by Filter: 2123 -> Bitmap Heap Scan on games_globalgame (cost=573.34..40292.09 rows=13902 width=49) (actual time=0.294..1.151 rows=2133 loops=1) Recheck Cond: (id < 100000) Filter: (digest IS NOT NULL) Rows Removed by Filter: 632 Heap Blocks: exact=90 -> Bitmap Index Scan on games_globalgame_pkey (cost=0.00..569.86 rows=14058 width=0) (actual time=0.217..0.217 rows=2765 loops=1) Index Cond: (id < 100000) Planning time: 0.087 ms Execution time: 2.203 ms (13 rows) """
It's worth noting that if I remove the id__lt
condition, or make it much larger (more permissive), I get the results back *immediately* - as long as there's enough results to show "remaining elements truncated".
This leads me to believe either Django isn't running the query it says it's running, or it's doing a bunch of extremely unoptimized operations for no good reason.
Thoughts?
Change History (2)
comment:1 by , 8 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:2 by , 8 years ago
The reporter says he printed queryset.query
and that query runs fast. This suggests:
- either time is spent elsewhere, but I'm not sure what could account for a delay of several seconds on such a query
- the query being run isn't the query he captured
I suggest configuring PostgreSQL to log all statements (just slow statements could suffice), to see:
- how much time is spent executing the statement in PostgreSQL
- if the query executed by PostgreSQL is the same query you tested separately
Honestly the chances that someone can reproduce the problem with the info you provided are slim; such problems are highly environment-dependent; that's why I suggest investigating further in the environment where you're seeing the problem.
It seems unlikely to me that Django is at fault here so I'll leave the ticket closed for now.
You can use logging or a print statement to check the query Django is running, or try to do some profiling to see what's slow. In general, you should use support channels to ask questions like this and open a ticket once you've identified that Django is it fault. Feel free to reopen with more specifics if your investigation leads to that conclusion. Thanks.