Opened 4 years ago
Closed 4 years ago
#31922 closed Bug (duplicate)
QuerySet.filter() against Q() with Subquery() and __in produces wrong results.
Reported by: | Chris Bell | Owned by: | Jacob Walls |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | orm, subquery, q, order |
Cc: | Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
This issue has been found when combining two Q statements that filter on the same models in subqueries, but with slightly different where clauses.
By combining the two Q in different orders the SQL resulting from the queries is different, one producing the correct result and one not. The SQL that is not correct has had an additional where clause added to the subquery.
Here is an example:
Two simple models (in module = utils)
class Book(BaseModel): name = models.CharField(max_length=100, null=True, blank=True) class BookStatus(BaseModel): book = models.ForeignKey(Book, null=False, blank=False, on_delete=models.CASCADE, related_name='statuses') status = models.CharField(max_length=100, null=True, blank=True) valid_from = models.DateTimeField(db_index=True) valid_to = models.DateTimeField(db_index=True, blank=True, null=True, default=None)
We will add some data:
from django.utils import timezone from utils.models import * from django.db.models import Q, Subquery b1 = Book.objects.create(name='Green is the new Blue') b2 = Book.objects.create(name='Orange is the new Red') BookStatus.objects.create(book=b1, status='DRAFT', valid_from=timezone.now()) BookStatus.objects.create(book=b1, status='PUBLISHED', valid_from=timezone.now()) BookStatus.objects.create(book=b2, status='DRAFT', valid_from=timezone.now())
Now we want to find all books that are in draft but not published.
q1 = Q(statuses__id__in=Subquery(BookStatus.objects.only('id').filter(status='DRAFT'))) q2 = ~Q(statuses__id__in=Subquery(BookStatus.objects.only('id').filter(status='PUBLISHED'))) qs = Book.objects.only('name').filter(q1 & q2) # NOTICE Q1 BEFORE Q2 str(qs.query)
The resulting query is:
SELECT "utils_book"."id", "utils_book"."name" FROM "utils_book" INNER JOIN "utils_bookstatus" ON ("utils_book"."id" = "utils_bookstatus"."book_id") WHERE ( "utils_bookstatus"."id" IN ( SELECT U0."id" FROM "utils_bookstatus" U0 WHERE U0."status" = 'DRAFT' ) AND NOT ( "utils_book"."id" IN ( SELECT V1."book_id" FROM "utils_bookstatus" V1 WHERE ( V1."id" IN ( SELECT U0."id" FROM "utils_bookstatus" U0 WHERE U0."status" = 'PUBLISHED' ) AND V1."id" = ("utils_bookstatus"."id") ) ) ) )
You can see the last where clause which joins the inner book status to the outer one.
The result being that both books are returned:
qs <QuerySet [<Book: Book object (1)>, <Book: Book object (2)>]>
Now, if we run the same query again, but reverse the order of the Q I get a different result:
qs = Book.objects.only('name').filter(q2 & q1) # NOTICE Q2 BEFORE Q1 str(qs.query) SELECT "utils_book"."id", "utils_book"."name" FROM "utils_book" INNER JOIN "utils_bookstatus" ON ("utils_book"."id" = "utils_bookstatus"."book_id") WHERE ( NOT ( "utils_book"."id" IN ( SELECT V1."book_id" FROM "utils_bookstatus" V1 WHERE V1."id" IN ( SELECT U0."id" FROM "utils_bookstatus" U0 WHERE U0."status" = 'PUBLISHED' ) ) ) AND "utils_bookstatus"."id" IN ( SELECT U0."id" FROM "utils_bookstatus" U0 WHERE U0."status" = 'DRAFT' ) )
The result of this query is correct:
qs <QuerySet [<Book: Book object (2)>]>
There are better ways of trying to get the answer this query is looking for, but this is a simplification of a more complex scenario and is for demoing the issue.
Unfortunately, I have not been able to work out why this is happening or where in the code the subquery is being amended.
Regards
Chris
Change History (9)
comment:1 by , 4 years ago
Description: | modified (diff) |
---|
comment:2 by , 4 years ago
Summary: | Changing ordering of Q statements in filter produces different SQL and results. → QuerySet.filter() against Q() with Subquery() and __in produces wrong results. |
---|---|
Triage Stage: | Unreviewed → Accepted |
Version: | 2.2 → master |
comment:3 by , 4 years ago
Hi @felixmm et al.
Thanks for confirming the issue and suggestion for using Exists.
The concern for us is how wide reaching this problem is. Ideally our tests would catch erroneous SQL but that is not guaranteed and we are concerned as to the possibility of security related issues arising from a similar scenario.
We are looking into ways of trying to identify examples of our Q usages that may be susceptible to this problem. In the meantime, if anyone in the community has any tips/thoughts on where this issue may originate, we would love to hear from you. We may be able to help investigate it with a bit of assistance, specifically from those with experience of the inner workings of the ORM.
Many thanks in advance
Chris
follow-up: 5 comment:4 by , 4 years ago
I'm experiencing this on a few very complicated querysets - reverting back to 3.0.6 fixes it, I'm thinking some of the queryset changes introduced in 3.0.7 are the cause of this.
comment:5 by , 4 years ago
Replying to Christian Klus:
I'm experiencing this on a few very complicated querysets - reverting back to 3.0.6 fixes it, I'm thinking some of the queryset changes introduced in 3.0.7 are the cause of this.
I'm able to reproduce the issue in Django < 3.0.6, so it seems that you've encountered a different problem.
comment:6 by , 4 years ago
Dropping a failing test for tests/queries/tests.py:
def test_ticket_31922(self): from django.db.models import Subquery a1 = Order.objects.create(id=1, name='a1') a2 = Order.objects.create(id=2, name='a2') OrderItem.objects.create(order=a1, status=1) OrderItem.objects.create(order=a1, status=2) OrderItem.objects.create(order=a2, status=1) q1 = Q(items__id__in=Subquery(OrderItem.objects.only('id').filter(status=1))) q2 = ~Q(items__id__in=Subquery(OrderItem.objects.only('id').filter(status=2))) qs = Order.objects.filter(q1 & q2) self.assertQuerysetEqual(qs, [repr(a2)])
This patch skips putting the join on the subquery, but it breaks the test for #14511, where the join is desired on exclude()
, so although it doesn't seem like the right patch it at least shows Chris where the changes are happening. If we need this join for exclude()
but not for filter()
I'm all ears for how we're going to tell that apart upstream.
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py index 4648daf395..fa707d6b04 100644 --- a/django/db/models/sql/query.py +++ b/django/db/models/sql/query.py @@ -1792,10 +1792,12 @@ class Query(BaseExpression): lookup_class = select_field.get_lookup('exact') # Note that the query.select[0].alias is different from alias # due to bump_prefix above. - lookup = lookup_class(pk.get_col(query.select[0].alias), - pk.get_col(alias)) - query.where.add(lookup, AND) - query.external_aliases[alias] = True + col1 = pk.get_col(query.select[0].alias) + col2 = pk.get_col(alias) + if col1.target != col2.target: + lookup = lookup_class(col1, col2) + query.where.add(lookup, AND) + query.external_aliases[alias] = True condition, needed_inner = self.build_filter( ('%s__in' % trimmed_prefix, query),
comment:7 by , 4 years ago
Has patch: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Patch involves detecting the nested Subquery
and then promoting any new INNER JOIN, such as the one I pointed out in my earlier comment.
comment:8 by , 4 years ago
Cc: | added |
---|
I spent a bit of time investigating the issue and I think that it's due a more general exclude
problem that has little to do with the usage of Subquery
e.g. the following test fails as well
q1 = Q(statuses__id__in=list(BookStatus.objects.filter(status='DRAFT').values_list('id', flat=True))) q2 = ~Q(statuses__id__in=list(BookStatus.objects.filter(status='PUBLISHED').values_list('id', flat=True))) self.assertSequenceEqual(Book.objects.only('name').filter(q1 & q2), b2)
For the same reason Jacob mentioned above due to the branch added in bec0b2a8c637ba2dc3745ad0c1e68ccc6c2f78bd to address #14511.
IMO it relates to the class of bugs described in #14645 and #25991 where composite filters passed to exclude should be pre-walked to combine subqueries into a single NOT EXISTS
that doesn't reuse any JOIN
s in any case
I believe we should close this ticket as duplicate of #14645 and focus our efforts there or reword this ticket description to stop referring to Subquery
as it's misleading.
comment:9 by , 4 years ago
Resolution: | → duplicate |
---|---|
Status: | assigned → closed |
Thanks Simon for finding these tickets.
Duplicate of #14645.
Thanks for the detailed report, I was able to reproduce this issue.
Everything works properly with
Exists()
:Reproduced at 41725602afebe2ddb018b99afe134384cc3bf69e.