Opened 10 months ago

Last modified 10 months ago

#27867 new Bug

Excluding a reverse relation produces inconsistent queries

Reported by: Sébastien Diemer Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: diemersebastien@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Sébastien Diemer)

When excluding a reverse relation from a queryset, the query generated by the ORM depends on the order in which the exclude kwargs are processed.
This order is random since python 3.3 (see https://docs.python.org/3/reference/datamodel.html#object.__hash).

Django converts a query like Tag.objects.filter(category_id=2).exclude(note__isnull=True) to the following SQL:

 SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id"
 FROM "queries_tag"
 WHERE (
     "queries_tag"."category_id" = 2
     AND NOT (
         "queries_tag"."id" IN (
             SELECT U0."id" AS Col1 FROM "queries_tag" U0 LEFT OUTER JOIN "queries_note" U1 ON ( U0."id" = U1."tag_id" ) WHERE U1."id" IS NULL))
 )
 ORDER BY "queries_tag"."name" ASC

If another kwarg is added to exclude, then the subquery "queries_tag"."id" IN may be modified : a new clause is added to the WHERE filter to avoid searching inside the entire queries_tag table. WHERE U1."id" IS NULL becomes WHERE (U1."id" IS NULL AND U0."id" = ("queries_tag"."id")).
This is not systematic, and seems to depend on the order on which the kwargs are processed (which is random for python3.3 and above).
This random behavior is the first thing to fix in my opinion.

In addition if the reverse relation is the only kwarg (like in my example above), the generated subquery misses this additional clause that enables to dramatically increase the performance of the query on large datasets. For performance reasons I think that the WHERE filtering in the subquery should always be WHERE (U1."id" IS NULL AND U0."id" = ("queries_tag"."id")).
This is a second aspect of the bug that could be fixed.

see https://github.com/sebdiem/django/commit/e8ca265e8c30f61c88ea17b7c1cab72695d6a4ff for details on how to reproduce the error.

The bugs exist in Django 1.8 and master (and probably all versions in between), tested with postgresql as a db backend.

Change History (4)

comment:1 Changed 10 months ago by Sébastien Diemer

Description: modified (diff)

comment:2 Changed 10 months ago by Sébastien Diemer

Description: modified (diff)

comment:3 Changed 10 months ago by Tim Graham

Triage Stage: UnreviewedAccepted

comment:4 Changed 10 months ago by Sébastien Diemer

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top