Opened 2 years ago
Last modified 23 months ago
#35099 closed Bug
Combining QuerySets with "|" or "&" produce side effects affecting further queries — at Version 1
| Reported by: | Alan | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 5.0 |
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Ready for checkin | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Hello everyone.
Combining some queries with "|" or "&" somehow affects queries involved in the operation, leading to malformed SQL and unexpected results.
Here are details and steps to reproduce. Apologise, for maybe a bit confusing model names, I copied them from production.
class SiteUser(models.Model):
pass
class Notification(models.Model):
user = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)
class PayoutRequest(models.Model):
requester = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)
Test:
from django.test import TestCase
from django.db.models import OuterRef, Exists
from reproduce.models import Notification, SiteUser, PayoutRequest
class Reproduce(TestCase):
def test(self):
u01 = SiteUser.objects.create()
u02 = SiteUser.objects.create()
u03 = SiteUser.objects.create()
Notification.objects.create(user=u01)
PayoutRequest.objects.create(requester=u01)
Notification.objects.create(user=u02)
PayoutRequest.objects.create(requester=u03)
are_active = SiteUser.objects.all().distinct()
got_money = SiteUser.objects.filter(
Exists(PayoutRequest.objects.filter(requester=OuterRef('pk')))
).distinct()
whatever_query = SiteUser.objects.all().distinct()
# Execute queries first time
need_help = are_active.exclude(pk__in=got_money)
notified = Notification.objects.filter(user__in=need_help).values_list('user_id', flat=True)
query_before = str(notified.query)
self.assertEqual(len(notified), 1) # correct
whatever_query | got_money # Touch "got_money" with any other query
# Execute same queries second time
need_help = are_active.exclude(pk__in=got_money)
notified = Notification.objects.filter(user__in=need_help).values_list('user_id', flat=True)
query_after = str(notified.query)
print(query_before)
print(query_after)
self.assertEqual(len(notified), 1) # expected 1, got 0
self.assertEqual(query_before, query_after) # false
As you can see, merely touching the got_money query with any other query leads to modifying the results of the same queries executed after that.
This test case probably may be simplified even further, but unfortunately, I have no more time resources to dig much deeper.
I had another queries built using simple .filter() and .exclude(). Those were not affected by combining.
I found only this query got_money using Exists() and OuterRef() to be affected. There might be more of which I am not aware of.
The reason for this I don't know, but query_before and query_after differs.
query_before correctly separates subqueries using W0, U0, V0 aliases, while the query_after uses a single U0 alias for all subqueries, leading to incorrect results.
Before
SELECT
"reproduce_notification"."user_id"
FROM
"reproduce_notification"
WHERE
"reproduce_notification"."user_id" IN (
SELECT
DISTINCT W0."id"
FROM
"reproduce_siteuser" W0
WHERE
NOT (
W0."id" IN (
SELECT
DISTINCT V0."id"
FROM
"reproduce_siteuser" V0
WHERE
EXISTS(
SELECT
1 AS "a"
FROM
"reproduce_payoutrequest" U0
WHERE
U0."requester_id" = (V0."id")
LIMIT
1
)
)
)
)
After
SELECT
"reproduce_notification"."user_id"
FROM
"reproduce_notification"
WHERE
"reproduce_notification"."user_id" IN (
SELECT
DISTINCT U0."id"
FROM
"reproduce_siteuser" U0
WHERE
NOT (
U0."id" IN (
SELECT
DISTINCT U0."id"
FROM
"reproduce_siteuser" U0
WHERE
EXISTS(
SELECT
1 AS "a"
FROM
"reproduce_payoutrequest" U0
WHERE
U0."requester_id" = (U0."id")
LIMIT
1
)
)
)
)
Found bug in version 4.2.7, but reproduced it in 5.0.1 the same way.
Feel free to request any additional information you might need for this.