#35099 closed Bug (fixed)
Combining QuerySets with "|" or "&" mutates right-hand side.
Reported by: | Alan | Owned by: | Hisham Mahmood |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.0 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Hello everyone.
Combining some queries with "|" or "&" somehow affects queries involved in the operation, leading to malformed SQL and unexpected results.
Here are details and steps to reproduce. Apologise, for maybe a bit confusing model names, I copied them from production.
class SiteUser(models.Model): pass class Notification(models.Model): user = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE) class PayoutRequest(models.Model): requester = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)
Test:
from django.test import TestCase from django.db.models import OuterRef, Exists from reproduce.models import Notification, SiteUser, PayoutRequest class Reproduce(TestCase): def test(self): u01 = SiteUser.objects.create() u02 = SiteUser.objects.create() u03 = SiteUser.objects.create() Notification.objects.create(user=u01) PayoutRequest.objects.create(requester=u01) Notification.objects.create(user=u02) PayoutRequest.objects.create(requester=u03) are_active = SiteUser.objects.all().distinct() got_money = SiteUser.objects.filter( Exists(PayoutRequest.objects.filter(requester=OuterRef('pk'))) ).distinct() whatever_query = SiteUser.objects.all().distinct() # Execute queries first time need_help = are_active.exclude(pk__in=got_money) notified = Notification.objects.filter(user__in=need_help).values_list('user_id', flat=True) query_before = str(notified.query) self.assertEqual(len(notified), 1) # correct whatever_query | got_money # Touch "got_money" with any other query # Execute same queries second time need_help = are_active.exclude(pk__in=got_money) notified = Notification.objects.filter(user__in=need_help).values_list('user_id', flat=True) query_after = str(notified.query) print(query_before) print(query_after) self.assertEqual(len(notified), 1) # expected 1, got 0 self.assertEqual(query_before, query_after) # false
As you can see, merely touching the got_money
query with any other query leads to modifying the results of the same queries executed after that.
This test case probably may be simplified even further, but unfortunately, I have no more time resources to dig much deeper.
I had another queries built using simple .filter() and .exclude(). Those were not affected by combining.
I found only this query got_money
using Exists()
and OuterRef()
to be affected. There might be more of which I am not aware of.
The reason for this I don't know, but query_before
and query_after
differs.
query_before
correctly separates subqueries using W0, U0, V0 aliases, while the query_after
uses a single U0 alias for all subqueries, leading to incorrect results.
Before
SELECT "reproduce_notification"."user_id" FROM "reproduce_notification" WHERE "reproduce_notification"."user_id" IN ( SELECT DISTINCT W0."id" FROM "reproduce_siteuser" W0 WHERE NOT ( W0."id" IN ( SELECT DISTINCT V0."id" FROM "reproduce_siteuser" V0 WHERE EXISTS( SELECT 1 AS "a" FROM "reproduce_payoutrequest" U0 WHERE U0."requester_id" = (V0."id") LIMIT 1 ) ) ) )
After
SELECT "reproduce_notification"."user_id" FROM "reproduce_notification" WHERE "reproduce_notification"."user_id" IN ( SELECT DISTINCT U0."id" FROM "reproduce_siteuser" U0 WHERE NOT ( U0."id" IN ( SELECT DISTINCT U0."id" FROM "reproduce_siteuser" U0 WHERE EXISTS( SELECT 1 AS "a" FROM "reproduce_payoutrequest" U0 WHERE U0."requester_id" = (U0."id") LIMIT 1 ) ) ) )
Found bug in version 4.2.7, but reproduced it in 5.0.1 the same way.
Feel free to request any additional information you might need for this.
Change History (8)
comment:1 by , 10 months ago
Description: | modified (diff) |
---|
follow-up: 3 comment:2 by , 10 months ago
Summary: | Combining QuerySets with "|" or "&" produce side effects affecting further queries → Combining QuerySets with "|" or "&" mutates right-hand side. |
---|---|
Triage Stage: | Unreviewed → Accepted |
follow-up: 4 comment:3 by , 10 months ago
Replying to Mariusz Felisiak:
Does it work for you? Would you like to prepare a patch? (a regression test is required).
It does work for me, the test now passes correctly after applying cloning the rhs.
Regarding the patch, I've never contributed to Django, so it may take quite a while for me to get familiar with what's required. If you could point me to the key resources on how to do that, I may take a shot on a weekend.
I found a "Submitting a patches" in Django doc. I need more guidance on how to cover this case with regression tests. I would highly appreciate some references to take a look at.
comment:4 by , 10 months ago
Regarding the patch, I've never contributed to Django, so it may take quite a while for me to get familiar with what's required. If you could point me to the key resources on how to do that, I may take a shot on a weekend.
Great, thanks! I'd recommend joining our Discord server and asking on the #contributing-getting-started
channel, folks are very helpful there.
comment:5 by , 10 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:6 by , 9 months ago
Has patch: | set |
---|---|
Owner: | changed from | to
Triage Stage: | Accepted → Ready for checkin |
comment:8 by , 9 months ago
Thank you, Hisham Mahmood and Mariusz, for picking this up.
Sorry for leaving this hanging. I got overwhelmed on my main job, so wasn't able to find a spare hour for this.
Thanks for the report.This is caused by bumping prefixes, we should probably clone
rhs
before doing this, e.g.django/db/models/sql/query.py
Does it work for you? Would you like to prepare a patch? (a regression test is required).