Opened 8 years ago
Last modified 6 years ago
#27867 new Bug
Excluding a reverse relation produces inconsistent queries
Reported by: | Sébastien Diemer | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | diemersebastien@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
When excluding a reverse relation from a queryset, the query generated by the ORM depends on the order in which the exclude
kwargs are processed.
This order is random since python 3.3 (see https://docs.python.org/3/reference/datamodel.html#object.__hash).
Django converts a query like Tag.objects.filter(category_id=2).exclude(note__isnull=True)
to the following SQL:
SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id" FROM "queries_tag" WHERE ( "queries_tag"."category_id" = 2 AND NOT ( "queries_tag"."id" IN ( SELECT U0."id" AS Col1 FROM "queries_tag" U0 LEFT OUTER JOIN "queries_note" U1 ON ( U0."id" = U1."tag_id" ) WHERE U1."id" IS NULL)) ) ORDER BY "queries_tag"."name" ASC
If another kwarg is added to exclude
, then the subquery "queries_tag"."id" IN
may be modified : a new clause is added to the WHERE
filter to avoid searching inside the entire queries_tag
table. WHERE U1."id" IS NULL
becomes WHERE (U1."id" IS NULL AND U0."id" = ("queries_tag"."id"))
.
This is not systematic, and seems to depend on the order on which the kwargs are processed (which is random for python3.3 and above).
This random behavior is the first thing to fix in my opinion.
In addition if the reverse relation is the only kwarg (like in my example above), the generated subquery misses this additional clause that enables to dramatically increase the performance of the query on large datasets. For performance reasons I think that the WHERE
filtering in the subquery should always be WHERE (U1."id" IS NULL AND U0."id" = ("queries_tag"."id"))
.
This is a second aspect of the bug that could be fixed.
see https://github.com/sebdiem/django/commit/e8ca265e8c30f61c88ea17b7c1cab72695d6a4ff for details on how to reproduce the error.
The bugs exist in Django 1.8 and master (and probably all versions in between), tested with postgresql as a db backend.
Change History (5)
comment:1 by , 8 years ago
Description: | modified (diff) |
---|
comment:2 by , 8 years ago
Description: | modified (diff) |
---|
comment:3 by , 8 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:4 by , 8 years ago
Description: | modified (diff) |
---|
I investigated this is a bit and the first non deterministic issue happened to have been fixed as a side effect b95c49c954e3b75678bb258e9fb2ec30d0d960bb.
The second issue still stands though but not as reported in the test case provided anymore because b95c49c954e3b75678bb258e9fb2ec30d0d960bb made
.exclude(name='', note__isnull=True)
kwargs passed toQ
always sorted as[('name', ''), ('note__isnull', True)]
which takes the code path whereU0."id" = ("queries_tag"."id")
is added to theWHERE
clause.In order to bypass the kwargs ordering added in b95c49c954e3b75678bb258e9fb2ec30d0d960bb that hides this issue as reported because
'name' < 'note'
one can rely onQ
args insteadexclude(Q(note__isnull=True) & Q(name=''))
exhibits the issue.exclude(Q(name='') & Q(note__isnull=True))
doesn't as it's equivalent to whatexclude(name='', note__isnull=True)
translates to.