Code

#19837 closed Bug (fixed)

Regression in QuerySet.exclude and many to many

Reported by: timo Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Release blocker Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The refactorings in #10790 appears to have cause an issue with QuerySet.exclude and many to many fields.

Attached is a regression test which demonstrates the problem. The test passes using checkout f811649710fb51e48217e9a78991735977decfd8, but fails against the next changeset (the ticket I mentioned above) through master.

In the SQL query from the test, you can see that the "WHERE NOT" clause compares program IDs to a subquery that retrieves identifier IDs: queries_program"."id" IN (SELECT U1."identifier_id

SELECT "queries_identifier"."id", "queries_identifier"."name"
FROM "queries_identifier"
LEFT OUTER JOIN "queries_program" ON (
    "queries_identifier"."id" = "queries_program"."identifier_id"
) WHERE NOT ((
    "queries_program"."id" IN (
        SELECT U1."identifier_id"
        FROM "queries_program" U1 
        INNER JOIN "queries_channel_programs" U2 ON (U1."id" = U2."program_id")
        WHERE (U2."channel_id" = 1  AND U1."identifier_id" IS NOT NULL)
    ) AND NOT ("queries_program"."identifier_id" IS NULL)
    AND "queries_program"."id" IS NOT NULL 
    AND "queries_program"."id" IS NOT NULL
)) ORDER BY "queries_identifier"."name" ASC

Attachments (1)

19837-test.diff (1.9 KB) - added by timo 14 months ago.

Download all attachments as: .zip

Change History (6)

Changed 14 months ago by timo

comment:1 Changed 14 months ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Severity changed from Normal to Release blocker

comment:2 Changed 14 months ago by carljm

Just to be extra-clear for anyone scanning release blockers for 1.5: the patch for #10790 is not present on the 1.5.x branch, so while this is a release blocker, it only blocks 1.6, not 1.5.

comment:3 Changed 14 months ago by akaariai

A patch is available from https://github.com/akaariai/django/compare/ticket_19837.

The regression was caused by using different split position in the inner and outer query, that is the trimmed_prefix in split_exclude() didn't match the query generated by set_start().

The patch does a bit larger refactor. I do think the patch now does something that can be reasoned to be correct, whereas before in master the correctness was based on wishful thinking.

Compared to 1.5.x the split_exclude() is more complex. In 1.5 the complexity however was handled by trim_joins(), and if you look at trim_joins() now and in 1.5 there is clear improvement. At least now the complexity is confined into split_exclude().

The patched code seems to generate a little bit more efficient queries. Compare patched:

SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id"
  FROM "queries_tag"
 WHERE NOT (
    ("queries_tag"."parent_id" IN (
        SELECT U2."tag_id"
          FROM "queries_annotation" U2
         WHERE (U2."name" = a1  AND U2."tag_id" IS NOT NULL)
    ) AND "queries_tag"."parent_id" IS NOT NULL))
 ORDER BY "queries_tag"."name" ASC

to 1.5

SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id"
  FROM "queries_tag"
  LEFT OUTER JOIN "queries_tag" T2 ON ("queries_tag"."parent_id" = T2."id")
 WHERE NOT (
    ("queries_tag"."parent_id" IN (
         SELECT U2."tag_id"
           FROM "queries_annotation" U2
          WHERE (U2."name" = a1  AND U2."tag_id" IS NOT NULL)
     ) AND "queries_tag"."parent_id" IS NOT NULL AND T2."id" IS NOT NULL))
 ORDER BY "queries_tag"."name" ASC

(from queries.test_tickets_8921_9188(), print(Tag.objects.exclude(parent__annotation__name="a1").query))

the extra join to queries_tag T2 seems non-necessary. If there isn't anything in T2 for the join, then queries_tag.parent_id is referencing something that doesn't exists in the DB, and this is a foreign key constraint violation. Thus the trim is legal here. And of course if something exists, then the whole join is just noise.

Also, compare:

SELECT "queries_identifier"."id", "queries_identifier"."name"
  FROM "queries_identifier"
  LEFT OUTER JOIN "queries_program" ON ("queries_identifier"."id" = "queries_program"."identifier_id")
 WHERE NOT (("queries_program"."id" IN (
     SELECT U2."program_id"
       FROM "queries_channel_programs" U2
      WHERE (U2."channel_id" = 1  AND U2."program_id" IS NOT NULL)
    ) AND NOT ("queries_program"."identifier_id" IS NULL)
  AND "queries_program"."id" IS NOT NULL)) ORDER BY "queries_identifier"."name" ASC

to

SELECT "queries_identifier"."id", "queries_identifier"."name"
  FROM "queries_identifier"
 WHERE NOT (("queries_identifier"."id" IN (
     SELECT U1."identifier_id"
       FROM "queries_program" U1
      INNER JOIN "queries_channel_programs" U2 ON (U1."id" = U2."program_id")
      WHERE (U2."channel_id" = 1  AND U1."identifier_id" IS NOT NULL)
    ) AND "queries_identifier"."id" IS NOT NULL))
 ORDER BY "queries_identifier"."name" ASC

(this is from this ticket's test)

Here the subquery's join is moved to the original query, and thus the subquery is simpler. The hope is that this is more efficient to execute. In addition the original query's join is now reusable. Sadly the join is LEFT OUTER with a NOT NULL where condition. It would be nice to fix that, but not this ticket's problem...

comment:4 Changed 14 months ago by timo

The patch fixes the issue in the app I discovered this in.

comment:5 Changed 14 months ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from new to closed

In b4492a8ca4a7ae4daa3a6b03c3d7a845fad74931:

Fixed #19837 -- Refactored split_exclude() join generation

The refactoring mainly concentrates on making sure the inner and outer
query agree about the split position. The split position is where the
multijoin happens, and thus the split position also determines the
columns used in the "WHERE col1 IN (SELECT col2 from ...)" condition.

This commit fixes a regression caused by #10790 and commit
69597e5bcc89aadafd1b76abf7efab30ee0b8b1a. The regression was caused
by wrong cols in the split position.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.