#33018 closed Bug (fixed)
Incorrect annotation value when doing a subquery with empty queryset
Reported by: | decomorreno | Owned by: | David Wobrock |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.2 |
Severity: | Normal | Keywords: | orm, annotate, EmptyQuerySet, empty, count |
Cc: | David Wobrock | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
ORM seems to generate annotation/subqueries incorrectly if empty queryset is used.
Models:
class Article(models.Model): author_name = models.CharField(max_length=100) content = models.TextField() is_public = models.BooleanField() class Comment(models.Model): article = models.ForeignKey(Article, related_name="comments", on_delete=models.CASCADE) author_name = models.CharField(max_length=100) content = models.TextField()
test data:
article = Article.objects.create(author_name="Jack", content="Example content", is_public=True) comment = Comment.objects.create(article=article, author_name="John", content="Example comment")
queries:
qs = Article.objects.all() # keep one list_x uncommented to see the difference: list_x = ["random_thing_that_is_not_equal_to_any_authors_name"] # list not empty, bug doesnt occur #list_x = [] # if this list is empty, then the bug occurs comment_qs = Comment.objects.filter(author_name__in=list_x) qs = qs.annotate( A=Coalesce(Subquery( comment_qs.annotate(x=Count('content')).values('x')[:1], output_field=IntegerField(), ), 101) # if list_x == [], Coalesce wont work and A will be 0 instead of 101 ) # please note that above annotation doesnt make much logical sense, its just for testing purposes qs = qs.annotate( B=Value(99, output_field=IntegerField()) ) qs = qs.annotate( C=F("A") + F("B") # if list_x == [], C will result in 0 sic! instead of 101 + 99 = 200 ) data = { "A": qs.last().A, "B": qs.last().B, "C": qs.last().C, } print(data) print(format_sql(qs.query))
console output for list_x=["random_thing_that_is_not_equal_to_any_authors_name"]
(expected, correct):
{'A': 101, 'B': 99, 'C': 200} SELECT "articles_article"."id", "articles_article"."author_name", "articles_article"."content", "articles_article"."is_public", COALESCE( (SELECT COUNT(U0."content") AS "x" FROM "articles_comment" U0 WHERE U0."author_name" IN (random_thing_that_is_not_equal_to_any_authors_name) GROUP BY U0."id", U0."article_id", U0."author_name", U0."content" LIMIT 1), 101) AS "A", 99 AS "B", (COALESCE( (SELECT COUNT(U0."content") AS "x" FROM "articles_comment" U0 WHERE U0."author_name" IN (random_thing_that_is_not_equal_to_any_authors_name) GROUP BY U0."id", U0."article_id", U0."author_name", U0."content" LIMIT 1), 101) + 99) AS "C" FROM "articles_article"
console output for list_x=[]
(incorrect):
{'A': 0, 'B': 99, 'C': 0} SELECT "articles_article"."id", "articles_article"."author_name", "articles_article"."content", "articles_article"."is_public", 0 AS "A", 99 AS "B", 0 AS "C" FROM "articles_article"
Background story: Above queries are made up (simplified), but based on some parts of logic that I had in my code. list_x was generated dynamically, and it was very hard to detect what is causing unexpected results. This behavior is very strange, I believe its a bug and needs to be fixed, because it is totally unintuitive that:
SomeModel.objects.filter(x__in=["something_that_causes_this_qs_lenth_to_be_0"])
and
SomeModel.objects.filter(x__in=[])
may yield different results when used in queries later, even though results of this querysets are logically equivalent
I will attach a minimal repro project (with code from above)
Attachments (1)
Change History (11)
by , 3 years ago
Attachment: | bug_repro.zip added |
---|
comment:1 by , 3 years ago
Summary: | Incorrect annotation value when doing a subquery on empty queryset → Incorrect annotation value when doing a subquery with empty queryset |
---|
comment:2 by , 3 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 3 years ago
Cc: | added |
---|---|
Has patch: | set |
Owner: | changed from | to
Status: | new → assigned |
Hi!
Thanks for the hints Simon, I tried a first patch where we catch empty values here https://github.com/django/django/pull/14770
Should we expand the solution a bit further and rename empty_aggregate_value
as you suggested, and use it in the SQLCompiler
too?
comment:5 by , 3 years ago
Hi, the patch is waiting on some review I believe. So once a maintainer has a bit of time available, we'll be able to move forward :)
comment:6 by , 3 years ago
Needs tests: | set |
---|---|
Patch needs improvement: | set |
comment:7 by , 3 years ago
Needs tests: | unset |
---|---|
Patch needs improvement: | unset |
comment:8 by , 3 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
The
0
assignment on empty result set comes from this line. I assume we could adjust the logic to rely ongetattr(col, 'empty_aggregate_value', NotImplemented)
and fallback to'0'
if it's missing.Makes me wonder if we want to rename
empty_aggregate_value
toempty_result_set_value
instead since it's not entirely bound to aggregation anymore.e.g. the following should exhibit the same behavior
It also seems weird that we default to
0
as opposed toNULL
which would be a more correct value.Alternatively we could adjust
Coalesce.as_sql
to catchEmptyResultSet
when it's compiling its source expressions but that's more involved as most of the logic for that currently lives inFunc.as_sql
. We could also use both of these approaches.