Opened 22 months ago

Closed 22 months ago

Last modified 22 months ago

#34254 closed Bug (fixed)

Exists annotations can return non-boolean results (i.e. None) if used with an empty QuerySet.

Reported by: Keryn Knight Owned by: rajdesai24
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: Exists EmptyQuerySet sqlite postgres
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I suspect this is following on from, but potentially separate to #33018 -- because that ticket starts out using Django 3.2 to observe that an empty queryset (EmptyQuerySet or whatever, via none()) can short circuit evaluation to be 0 as ... in the SQL query, which is exactly the same problem I observed.

However, as far as I can tell, the result of an Exists(queryset.none()) can still return values outside of True/False, namely, None.

Using Django main as of 4593bc5da115f2e808a803a4ec24104b6c7a6152 (from Wed Jan 11 ... 2023), here's the behaviour on both postgres and sqlite. In both scenarios I'm using 3.10.2 with psycopg2==2.9.3 and sqlite3.sqlite_version is 3.37.0 and sqlite3.version is 2.6.0.
IPython outputs 8 and 11 are the problems.

class A(models.Model):
    pass

class B(models.Model):
    pass
In [1]: from app.models import A, B

In [2]: A.objects.using("pg").create()
Out[2]: <A: A object (1)>

In [3]: B.objects.using("pg").create()
Out[3]: <B: B object (1)>

In [4]: A.objects.using("sqlite").create()
Out[4]: <A: A object (1)>

In [4]: B.objects.using("sqlite").create()
Out[4]: <B: B object (1)>

In [5]: from django.db.models import Exists

In [6]: A.objects.using("sqlite").annotate(should_be_bool=Exists(B.objects.all())).first().should_be_bool
Out[6]: True

In [7]: A.objects.using("sqlite").annotate(should_be_bool=Exists(B.objects.filter(pk=99999999))).first().should_be_bool
Out[7]: False

In [8]: A.objects.using("sqlite").annotate(should_be_bool=Exists(B.objects.none())).first().should_be_bool
# This is the problem, it returned neither True nor False

In [9]: A.objects.using("pg").annotate(should_be_bool=Exists(B.objects.all())).first().should_be_bool
Out[9]: True

In [10]: A.objects.using("pg").annotate(should_be_bool=Exists(B.objects.filter(pk=99999999))).first().should_be_bool
Out[10]: False

In [11]: A.objects.using("pg").annotate(should_be_bool=Exists(B.objects.none())).first().should_be_bool
# This is the problem, it returned neither True nor False

And the queries, which are the same for postgres & sqlite:

# ...
{'sql': 'SELECT "app_a"."id", EXISTS(SELECT 1 AS "a" FROM "app_b" LIMIT 1) AS "should_be_bool" FROM "app_a" ORDER BY "app_a"."id" ASC LIMIT 1',
  'time': '0.001'},
 {'sql': 'SELECT "app_a"."id", EXISTS(SELECT 1 AS "a" FROM "app_b" U0 WHERE U0."id" = 99999999 LIMIT 1) AS "should_be_bool" FROM "app_a" ORDER BY "app_a"."id" ASC LIMIT 1',
  'time': '0.001'},
 {'sql': 'SELECT "app_a"."id", NULL AS "should_be_bool" FROM "app_a" ORDER BY "app_a"."id" ASC LIMIT 1',
  'time': '0.001'}

Given Exists has an output_field of BooleanField and that definition doesn't have null=True as an argument, it seems incongruent from both an expectations ("exists sounds boolean") and implementation ("it doesn't say it could be null") standpoint.


Whilst the problem exists in main, it has also changed behaviour (presumably or potentially unexpectedly) since 3.2, where postgres and sqlite actually do different things, hence we tested both above. So main is now consistent, but I'd personally argue it's consistently wrong (for a given value thereof, no judgement made!)

In 3.2.16, under sqlite, using annotate(x=Exists(y.none())) returns False but on main it now returns None (see above) -- the 3.2 behaviour is correct for my expectations

In [4]: A.objects.using("sqlite").annotate(should_be_bool=Exists(B.objects.none())).first().should_be_bool
Out[4]: False
In [5]: connections['sqlite'].queries
Out[5]:
 {'sql': 'SELECT "app_a"."id", 0 AS "should_be_bool" FROM "app_a" ORDER BY "app_a"."id" ASC LIMIT 1',
  'time': '0.000'}

In 3.2.16 with postgres we get neither None nor False but the integer 0 instead:

In [4]: A.objects.using("pg").annotate(should_be_bool=Exists(B.objects.none())).first().should_be_bool
Out[4]: 0
In [5]: connections['pg'].queries
Out[5]:
{'sql': 'SELECT "app_a"."id", 0 AS "should_be_bool" FROM "app_a" ORDER BY "app_a"."id" ASC LIMIT 1',
  'time': '0.001'}

So we can observe that under 3.2 using the 0 AS ... behaviour

  • sqlite appears to behave correctly (returning False)
  • postgres appears to behave incorrectly (failing to cast a raw integer to a boolean)

And under main using the NULL AS ... behaviour

  • sqlite no longer behaves the same, returning None where I'd expect False (or even True given the way SQL EXISTS works...)
  • postgres behaves differently, 0 is now None but I'd still expect True or False as the output.

Change History (15)

comment:1 by Simon Charette, 22 months ago

Triage Stage: UnreviewedAccepted

Thanks for the detailed report Keryn.

I suspect the solution is as simple as setting Exists.empty_result_set_value = False which was missed in dd1fa3a31b4680c0d3712e6ae122b878138580c7 and since Exists sublasses Subquery it inherited its .empty_result_set_value = None.

diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
index c270ef16c7..fc1d94fefb 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -1548,6 +1548,7 @@ def get_group_by_cols(self):
 class Exists(Subquery):
     template = "EXISTS(%(subquery)s)"
     output_field = fields.BooleanField()
+    empty_result_set_value = False

     def __init__(self, queryset, **kwargs):
         super().__init__(queryset, **kwargs)

If you can confirm this addresses the problem you reported could you submit a PR with a regression test?

comment:2 by rajdesai24, 22 months ago

Owner: changed from nobody to rajdesai24
Status: newassigned

comment:3 by rajdesai24, 22 months ago

Hey Simon would love to submit a PR

in reply to:  1 comment:4 by Keryn Knight, 22 months ago

Replying to Simon Charette:

I suspect the solution is as simple as setting Exists.empty_result_set_value = False [...]
[...]
If you can confirm this addresses the problem you reported [...]

Confirming, it does seem to be as simple as that (a pleasant rarity in ORM edge cases, I'm sure!) for the 2 backends I was testing (postgres and sqlite -- I've notably not checked mysql or oracle etc):

In [3]: A.objects.using("sqlite").annotate(should_be_bool=Exists(B.objects.none())).first().should_be_bool
Out[3]: False

In [4]: A.objects.using("pg").annotate(should_be_bool=Exists(B.objects.none())).first().should_be_bool
Out[4]: False

The queries are still recorded as different, but I expect that's normal coercion differences between sqlite & postgres:

In [6]: connections['sqlite'].queries
Out[6]:
[{'sql': 'SELECT "app_a"."id", 0 AS "should_be_bool" FROM "app_a" ORDER BY "app_a"."id" ASC LIMIT 1',
  'time': '0.001'}]

In [7]: connections['pg'].queries
Out[7]:
[{'sql': 'SELECT "app_a"."id", false AS "should_be_bool" FROM "app_a" ORDER BY "app_a"."id" ASC LIMIT 1',
  'time': '0.001'}]

WRT to fixing it, as rajdesai24 has expressed an interest in doing so, I'll let them take the lead and get the contribution under their belt.

Replying to rajdesai24:

Hey Simon would love to submit a PR

comment:5 by Simon Charette, 22 months ago

Thanks for confirming Keryn, the difference in query generation is effectively expected on depending on whether the backend has a native boolean type.


rajdesai24, sure thing please go ahead and submit a PR. Make sure to include a regression test demonstrating the problem that fails without the patch applied and passes with it applied.

I suggest adding one in tests/annotations/tests.py similarly to the one added in dd1fa3a31b4680c0d3712e6ae122b878138580c7 but using Exists instead of Subquery.

Last edited 22 months ago by Simon Charette (previous) (diff)

comment:6 by rajdesai24, 22 months ago

comment:7 by rajdesai24, 22 months ago

Hey Simon the changes are done. The only thing is my test is pretty simple, which is testing the exact query that Keryn reported. Do tell me if any issues with the PR or any changes to the tests.

comment:8 by Simon Charette, 22 months ago

Has patch: set
Patch needs improvement: set

Thanks for the patch rajdesai24. I left some comments on the PR.

comment:9 by rajdesai24, 22 months ago

hey, Simon please review the changes and If everything seems right, merge them if possible.

comment:10 by Simon Charette, 22 months ago

Left some more comments on the PR, the test is not testing the right thing and passes even without the changes applied.

comment:11 by Mariusz Felisiak, 22 months ago

Needs tests: set

comment:12 by Simon Charette, 22 months ago

Needs tests: unset
Patch needs improvement: unset

Proposed MR now has an appropriate regression test.

comment:13 by Mariusz Felisiak, 22 months ago

Triage Stage: AcceptedReady for checkin

comment:14 by Mariusz Felisiak <felisiak.mariusz@…>, 22 months ago

Resolution: fixed
Status: assignedclosed

In 246eb483:

Fixed #34254 -- Fixed return value of Exists() with empty queryset.

Thanks Simon Charette for reviews.

comment:15 by Mariusz Felisiak <felisiak.mariusz@…>, 22 months ago

In f210ad1b:

[4.2.x] Fixed #34254 -- Fixed return value of Exists() with empty queryset.

Thanks Simon Charette for reviews.

Backport of 246eb4836a6fb967880f838aa0d22ecfdca8b6f1 from main

Note: See TracTickets for help on using tickets.
Back to Top