Opened 3 years ago

Last modified 20 months ago

#27867 new Bug

Excluding a reverse relation produces inconsistent queries

Reported by: Sébastien Diemer Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: diemersebastien@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Sébastien Diemer)

When excluding a reverse relation from a queryset, the query generated by the ORM depends on the order in which the exclude kwargs are processed.
This order is random since python 3.3 (see https://docs.python.org/3/reference/datamodel.html#object.__hash).

Django converts a query like Tag.objects.filter(category_id=2).exclude(note__isnull=True) to the following SQL:

 SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id"
 FROM "queries_tag"
 WHERE (
     "queries_tag"."category_id" = 2
     AND NOT (
         "queries_tag"."id" IN (
             SELECT U0."id" AS Col1 FROM "queries_tag" U0 LEFT OUTER JOIN "queries_note" U1 ON ( U0."id" = U1."tag_id" ) WHERE U1."id" IS NULL))
 )
 ORDER BY "queries_tag"."name" ASC

If another kwarg is added to exclude, then the subquery "queries_tag"."id" IN may be modified : a new clause is added to the WHERE filter to avoid searching inside the entire queries_tag table. WHERE U1."id" IS NULL becomes WHERE (U1."id" IS NULL AND U0."id" = ("queries_tag"."id")).
This is not systematic, and seems to depend on the order on which the kwargs are processed (which is random for python3.3 and above).
This random behavior is the first thing to fix in my opinion.

In addition if the reverse relation is the only kwarg (like in my example above), the generated subquery misses this additional clause that enables to dramatically increase the performance of the query on large datasets. For performance reasons I think that the WHERE filtering in the subquery should always be WHERE (U1."id" IS NULL AND U0."id" = ("queries_tag"."id")).
This is a second aspect of the bug that could be fixed.

see https://github.com/sebdiem/django/commit/e8ca265e8c30f61c88ea17b7c1cab72695d6a4ff for details on how to reproduce the error.

The bugs exist in Django 1.8 and master (and probably all versions in between), tested with postgresql as a db backend.

Change History (5)

comment:1 Changed 3 years ago by Sébastien Diemer

Description: modified (diff)

comment:2 Changed 3 years ago by Sébastien Diemer

Description: modified (diff)

comment:3 Changed 3 years ago by Tim Graham

Triage Stage: UnreviewedAccepted

comment:4 Changed 3 years ago by Sébastien Diemer

Description: modified (diff)

comment:5 Changed 20 months ago by Simon Charette

I investigated this is a bit and the first non deterministic issue happened to have been fixed as a side effect b95c49c954e3b75678bb258e9fb2ec30d0d960bb.

The second issue still stands though but not as reported in the test case provided anymore because b95c49c954e3b75678bb258e9fb2ec30d0d960bb made .exclude(name='', note__isnull=True) kwargs passed to Q always sorted as [('name', ''), ('note__isnull', True)] which takes the code path where U0."id" = ("queries_tag"."id") is added to the WHERE clause.

In order to bypass the kwargs ordering added in b95c49c954e3b75678bb258e9fb2ec30d0d960bb that hides this issue as reported because 'name' < 'note' one can rely on Q args instead

  1. exclude(Q(note__isnull=True) & Q(name='')) exhibits the issue.
  2. exclude(Q(name='') & Q(note__isnull=True)) doesn't as it's equivalent to what exclude(name='', note__isnull=True) translates to.
Note: See TracTickets for help on using tickets.
Back to Top