#35099 closed Bug (fixed)
Combining QuerySets with "|" or "&" mutates right-hand side.
| Reported by: | Alan | Owned by: | Hisham Mahmood | 
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 5.0 | 
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Ready for checkin | |
| Has patch: | yes | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | no | 
| Easy pickings: | no | UI/UX: | no | 
Description (last modified by )
Hello everyone.
Combining some queries with  "|" or "&" somehow affects queries involved in the operation, leading to malformed SQL and unexpected results.
Here are details and steps to reproduce. Apologise, for maybe a bit confusing model names, I copied them from production.
class SiteUser(models.Model):
    pass
class Notification(models.Model):
    user = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)
class PayoutRequest(models.Model):
    requester = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)
Test:
from django.test import TestCase
from django.db.models import OuterRef, Exists
from reproduce.models import Notification, SiteUser, PayoutRequest
class Reproduce(TestCase):
    def test(self):
        u01 = SiteUser.objects.create()
        u02 = SiteUser.objects.create()
        u03 = SiteUser.objects.create()
        Notification.objects.create(user=u01)
        PayoutRequest.objects.create(requester=u01)
        Notification.objects.create(user=u02)
        PayoutRequest.objects.create(requester=u03)
        are_active = SiteUser.objects.all().distinct()
        got_money = SiteUser.objects.filter(
            Exists(PayoutRequest.objects.filter(requester=OuterRef('pk')))
        ).distinct()
        whatever_query = SiteUser.objects.all().distinct()
        # Execute queries first time
        need_help = are_active.exclude(pk__in=got_money)
        notified = Notification.objects.filter(user__in=need_help).values_list('user_id', flat=True)
        query_before = str(notified.query)
        self.assertEqual(len(notified), 1)  # correct
        whatever_query | got_money  # Touch "got_money" with any other query
        # Execute same queries second time
        need_help = are_active.exclude(pk__in=got_money)
        notified = Notification.objects.filter(user__in=need_help).values_list('user_id', flat=True)
        query_after = str(notified.query)
        print(query_before)
        print(query_after)
        self.assertEqual(len(notified), 1)  # expected 1, got 0
        self.assertEqual(query_before, query_after)  # false
As you can see, merely touching the got_money query with any other query leads to modifying the results of the same queries executed after that.
This test case probably may be simplified even further, but unfortunately, I have no more time resources to dig much deeper.
I had another queries built using simple .filter() and .exclude(). Those were not affected by combining.
I found only this query got_money using Exists() and OuterRef() to be affected. There might be more of which I am not aware of.
The reason for this I don't know, but query_before and query_after differs.
query_before correctly separates subqueries using W0, U0, V0 aliases, while the query_after uses a single U0 alias for all subqueries, leading to incorrect results.
Before
SELECT 
  "reproduce_notification"."user_id" 
FROM 
  "reproduce_notification" 
WHERE 
  "reproduce_notification"."user_id" IN (
    SELECT 
      DISTINCT W0."id" 
    FROM 
      "reproduce_siteuser" W0 
    WHERE 
      NOT (
        W0."id" IN (
          SELECT 
            DISTINCT V0."id" 
          FROM 
            "reproduce_siteuser" V0 
          WHERE 
            EXISTS(
              SELECT 
                1 AS "a" 
              FROM 
                "reproduce_payoutrequest" U0 
              WHERE 
                U0."requester_id" = (V0."id") 
              LIMIT 
                1
            )
        )
      )
  ) 
After
SELECT 
  "reproduce_notification"."user_id" 
FROM 
  "reproduce_notification" 
WHERE 
  "reproduce_notification"."user_id" IN (
    SELECT 
      DISTINCT U0."id" 
    FROM 
      "reproduce_siteuser" U0 
    WHERE 
      NOT (
        U0."id" IN (
          SELECT 
            DISTINCT U0."id" 
          FROM 
            "reproduce_siteuser" U0 
          WHERE 
            EXISTS(
              SELECT 
                1 AS "a" 
              FROM 
                "reproduce_payoutrequest" U0 
              WHERE 
                U0."requester_id" = (U0."id") 
              LIMIT 
                1
            )
        )
      )
  )
Found bug in version 4.2.7, but reproduced it in 5.0.1 the same way.
Feel free to request any additional information you might need for this.
Change History (8)
comment:1 by , 22 months ago
| Description: | modified (diff) | 
|---|
follow-up: 3 comment:2 by , 22 months ago
| Summary: | Combining QuerySets with "|" or "&" produce side effects affecting further queries → Combining QuerySets with "|" or "&" mutates right-hand side. | 
|---|---|
| Triage Stage: | Unreviewed → Accepted | 
follow-up: 4 comment:3 by , 22 months ago
Replying to Mariusz Felisiak:
Does it work for you? Would you like to prepare a patch? (a regression test is required).
It does work for me, the test now passes correctly after applying cloning the rhs.
Regarding the patch, I've never contributed to Django, so it may take quite a while for me to get familiar with what's required. If you could point me to the key resources on how to do that, I may take a shot on a weekend.
I found a "Submitting a patches" in Django doc. I need more guidance on how to cover this case with regression tests. I would highly appreciate some references to take a look at.
comment:4 by , 22 months ago
Regarding the patch, I've never contributed to Django, so it may take quite a while for me to get familiar with what's required. If you could point me to the key resources on how to do that, I may take a shot on a weekend.
Great, thanks! I'd recommend joining our Discord server and asking on the #contributing-getting-started channel, folks are very helpful there.
comment:5 by , 22 months ago
| Owner: | changed from to | 
|---|---|
| Status: | new → assigned | 
comment:6 by , 21 months ago
| Has patch: | set | 
|---|---|
| Owner: | changed from to | 
| Triage Stage: | Accepted → Ready for checkin | 
comment:8 by , 21 months ago
Thank you, Hisham Mahmood and Mariusz, for picking this up.
Sorry for leaving this hanging. I got overwhelmed on my main job, so wasn't able to find a spare hour for this.
Thanks for the report.This is caused by bumping prefixes, we should probably clone
rhsbefore doing this, e.g.django/db/models/sql/query.py
Does it work for you? Would you like to prepare a patch? (a regression test is required).