Opened 4 months ago

Closed 3 months ago

Last modified 3 months ago

#35099 closed Bug (fixed)

Combining QuerySets with "|" or "&" mutates right-hand side.

Reported by: Alan Owned by: Hisham Mahmood
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Alan)

Hello everyone.

Combining some queries with "|" or "&" somehow affects queries involved in the operation, leading to malformed SQL and unexpected results.

Here are details and steps to reproduce. Apologise, for maybe a bit confusing model names, I copied them from production.

class SiteUser(models.Model):
    pass

class Notification(models.Model):
    user = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)


class PayoutRequest(models.Model):
    requester = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)

Test:

from django.test import TestCase
from django.db.models import OuterRef, Exists
from reproduce.models import Notification, SiteUser, PayoutRequest


class Reproduce(TestCase):

    def test(self):
        u01 = SiteUser.objects.create()
        u02 = SiteUser.objects.create()
        u03 = SiteUser.objects.create()

        Notification.objects.create(user=u01)
        PayoutRequest.objects.create(requester=u01)
        Notification.objects.create(user=u02)
        PayoutRequest.objects.create(requester=u03)

        are_active = SiteUser.objects.all().distinct()
        got_money = SiteUser.objects.filter(
            Exists(PayoutRequest.objects.filter(requester=OuterRef('pk')))
        ).distinct()
        whatever_query = SiteUser.objects.all().distinct()

        # Execute queries first time
        need_help = are_active.exclude(pk__in=got_money)
        notified = Notification.objects.filter(user__in=need_help).values_list('user_id', flat=True)
        query_before = str(notified.query)

        self.assertEqual(len(notified), 1)  # correct

        whatever_query | got_money  # Touch "got_money" with any other query

        # Execute same queries second time
        need_help = are_active.exclude(pk__in=got_money)
        notified = Notification.objects.filter(user__in=need_help).values_list('user_id', flat=True)
        query_after = str(notified.query)

        print(query_before)
        print(query_after)
        self.assertEqual(len(notified), 1)  # expected 1, got 0
        self.assertEqual(query_before, query_after)  # false

As you can see, merely touching the got_money query with any other query leads to modifying the results of the same queries executed after that.
This test case probably may be simplified even further, but unfortunately, I have no more time resources to dig much deeper.

I had another queries built using simple .filter() and .exclude(). Those were not affected by combining.
I found only this query got_money using Exists() and OuterRef() to be affected. There might be more of which I am not aware of.

The reason for this I don't know, but query_before and query_after differs.
query_before correctly separates subqueries using W0, U0, V0 aliases, while the query_after uses a single U0 alias for all subqueries, leading to incorrect results.

Before

SELECT 
  "reproduce_notification"."user_id" 
FROM 
  "reproduce_notification" 
WHERE 
  "reproduce_notification"."user_id" IN (
    SELECT 
      DISTINCT W0."id" 
    FROM 
      "reproduce_siteuser" W0 
    WHERE 
      NOT (
        W0."id" IN (
          SELECT 
            DISTINCT V0."id" 
          FROM 
            "reproduce_siteuser" V0 
          WHERE 
            EXISTS(
              SELECT 
                1 AS "a" 
              FROM 
                "reproduce_payoutrequest" U0 
              WHERE 
                U0."requester_id" = (V0."id") 
              LIMIT 
                1
            )
        )
      )
  ) 

After

SELECT 
  "reproduce_notification"."user_id" 
FROM 
  "reproduce_notification" 
WHERE 
  "reproduce_notification"."user_id" IN (
    SELECT 
      DISTINCT U0."id" 
    FROM 
      "reproduce_siteuser" U0 
    WHERE 
      NOT (
        U0."id" IN (
          SELECT 
            DISTINCT U0."id" 
          FROM 
            "reproduce_siteuser" U0 
          WHERE 
            EXISTS(
              SELECT 
                1 AS "a" 
              FROM 
                "reproduce_payoutrequest" U0 
              WHERE 
                U0."requester_id" = (U0."id") 
              LIMIT 
                1
            )
        )
      )
  )

Found bug in version 4.2.7, but reproduced it in 5.0.1 the same way.
Feel free to request any additional information you might need for this.

Change History (8)

comment:1 by Alan, 4 months ago

Description: modified (diff)

comment:2 by Mariusz Felisiak, 4 months ago

Summary: Combining QuerySets with "|" or "&" produce side effects affecting further queriesCombining QuerySets with "|" or "&" mutates right-hand side.
Triage Stage: UnreviewedAccepted

Thanks for the report.This is caused by bumping prefixes, we should probably clone rhs before doing this, e.g.

  • django/db/models/sql/query.py

    diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
    index a79d66eb21..5539f35b1c 100644
    a b class Query(BaseExpression):  
    685685        # except if the alias is the base table since it must be present in the
    686686        # query on both sides.
    687687        initial_alias = self.get_initial_alias()
     688        rhs = rhs.clone()
    688689        rhs.bump_prefix(self, exclude={initial_alias})
    689690
    690691        # Work out how to relabel the rhs aliases, if necessary.

Does it work for you? Would you like to prepare a patch? (a regression test is required).

in reply to:  2 ; comment:3 by Alan, 4 months ago

Replying to Mariusz Felisiak:

Does it work for you? Would you like to prepare a patch? (a regression test is required).

It does work for me, the test now passes correctly after applying cloning the rhs.

Regarding the patch, I've never contributed to Django, so it may take quite a while for me to get familiar with what's required. If you could point me to the key resources on how to do that, I may take a shot on a weekend.

I found a "Submitting a patches" in Django doc. I need more guidance on how to cover this case with regression tests. I would highly appreciate some references to take a look at.

in reply to:  3 comment:4 by Mariusz Felisiak, 4 months ago

Regarding the patch, I've never contributed to Django, so it may take quite a while for me to get familiar with what's required. If you could point me to the key resources on how to do that, I may take a shot on a weekend.

Great, thanks! I'd recommend joining our Discord server and asking on the #contributing-getting-started channel, folks are very helpful there.

comment:5 by Mariusz Felisiak, 4 months ago

Owner: changed from nobody to Alan
Status: newassigned

comment:6 by Mariusz Felisiak, 3 months ago

Has patch: set
Owner: changed from Alan to Hisham Mahmood
Triage Stage: AcceptedReady for checkin

comment:7 by Mariusz Felisiak <felisiak.mariusz@…>, 3 months ago

Resolution: fixed
Status: assignedclosed

In d79fba7d:

Fixed #35099 -- Prevented mutating queryset when combining with & and | operators.

Thanks Alan for the report.

Co-authored-by: Mariusz Felisiak <felisiak.mariusz@…>

comment:8 by Alan, 3 months ago

Thank you, Hisham Mahmood and Mariusz, for picking this up.

Sorry for leaving this hanging. I got overwhelmed on my main job, so wasn't able to find a spare hour for this.

Note: See TracTickets for help on using tickets.
Back to Top