#31097 closed Bug (fixed)
StringAgg And ArrayAgg with filtering in subquery generates invalid string_agg() SQL function call
Reported by: | Laurent Tramoy | Owned by: | David Wobrock |
---|---|---|---|
Component: | contrib.postgres | Version: | 3.0 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This issue is similar to issue #30315 but using the filter
keyword instead of ordering
, so I'll reuse the same structure. I tested it on Django 2.2.8
Consider the following models (in the people app):
from django.db import models from django.contrib.postgres.fields import ArrayField class Person(models.Model): """Person model.""" first_name = models.TextField() last_name = models.TextField() country = models.TextField(null=True, blank=True) class Book(models.Model): """Book model.""" category = ArrayField(models.CharField(max_length=32), null=True, default=list) people = models.ManyToManyField(Person)
with the following objects:
p1 = Person.objects.create(first_name="John", last_name="Doe") p2 = Person.objects.create(first_name="Jane", last_name="Doe") b1 = Book.objects.create() b1.people.add(p1) b2 = Book.objects.create() b2.people.add(p1, p2) b3 = Book.objects.create()
This fails:
from django.contrib.postgres.aggregates import StringAgg from django.db.models import Subquery, OuterRef, Q from people.models import Book subquery = ( Book.objects.annotate( _annotated_value=StringAgg( "people__first_name", ",", filter=Q(people__first_name__startswith="Ja") ), ) .filter(pk=OuterRef("pk"),) .exclude(_annotated_value="") .values("id") ) Book.objects.filter(id__in=Subquery(subquery))
The generated SQL is
SELECT "people_book"."id", "people_book"."category" FROM "people_book" WHERE "people_book"."id" IN ( SELECT U0."id" FROM "people_book" U0 LEFT OUTER JOIN "people_book_people" U1 ON (U0."id" = U1."book_id") LEFT OUTER JOIN "people_person" U2 ON (U1."person_id" = U2."id") WHERE U0."id" = ("people_book"."id") GROUP BY U0."id" HAVING NOT (STRING_AGG(, ',') FILTER (WHERE U2."first_name") =))
as we can see, the `STRING_AGG argument is wrong.
The same query without the filter
works fine:
subquery = ( Book.objects.annotate( _annotated_value=StringAgg( "people__first_name", "," ), ) .filter(pk=OuterRef("pk"),) .exclude(_annotated_value="") .values("id") ) Book.objects.filter(id__in=Subquery(subquery))
SQL query:
SELECT "people_book"."id", "people_book"."category" FROM "people_book" WHERE "people_book"."id" IN ( SELECT U0."id" FROM "people_book" U0 LEFT OUTER JOIN "people_book_people" U1 ON (U0. "id" = U1."book_id") LEFT OUTER JOIN "people_person" U2 ON (U1."person_id" = U2."id") WHERE U0."id" = ("people_book"."id") GROUP BY U0."id" HAVING NOT (STRING_AGG(U2."first_name", ',') =))
as well as the same query without using Subquery
:
query = ( Book.objects.annotate( _annotated_value=StringAgg( "people__first_name", ",", filter=Q(people__first_name__startswith="Ja") ), ) .exclude(_annotated_value="") )
SQL query:
SELECT "people_book"."id", "people_book"."category", STRING_AGG("people_person"."first_name", ',') FILTER (WHERE "people_person"."first_name"::text LIKE Ja %) AS "_annotated_value" FROM "people_book" LEFT OUTER JOIN "people_book_people" ON ("people_book"."id" = "people_book_people"."book_id") LEFT OUTER JOIN "people_person" ON ("people_book_people"."person_id" = "people_person"."id") GROUP BY "people_book"."id" HAVING NOT (STRING_AGG("people_person"."first_name", ',') FILTER (WHERE ("people_person"."first_name"::text LIKE Ja %)) =)
Just to make sure I wasn't using an old version, I tried the query from #30315, which works fine.
NB: I originally noticed that bug using ArrayAgg instead of StringAgg, but I encountered another bug using ArrayAgg (or at least what I think is a bug) while writing the example code, I'll report it later if needed
Change History (29)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Version: | 2.2 → 3.0 |
Reproduced on current master (ff00a053478fee06bdfb4206c6d4e079e98640ff) using the code in the ticket.
Here's the prettified query reported by --debug-sql
after the testcase failure:
Line | |
---|---|
1 | SELECT |
2 | "aaaaaaticket31097_book"."id", |
3 | "aaaaaaticket31097_book"."category" |
4 | FROM |
5 | "aaaaaaticket31097_book" |
6 | WHERE |
7 | "aaaaaaticket31097_book"."id" IN ( |
8 | SELECT |
9 | U0."id" |
10 | FROM |
11 | "aaaaaaticket31097_book" U0 |
12 | LEFT OUTER JOIN "aaaaaaticket31097_book_people" U1 ON (U0."id" = U1."book_id") |
13 | LEFT OUTER JOIN "aaaaaaticket31097_person" U2 ON (U1."person_id" = U2."id") |
14 | WHERE |
15 | U0."id" = "aaaaaaticket31097_book"."id" |
16 | GROUP BY |
17 | U0."id" |
18 | HAVING |
19 | NOT ( |
20 | STRING_AGG(, ',') FILTER ( |
21 | WHERE |
22 | U2."first_name" |
23 | ) = '' |
24 | ) |
25 | ) |
comment:3 by , 5 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
Hi, I'm claiming the ticket if nobody bothers.
I already wrote a testcase that reproduced the bug and started to pin down the source of the issue.
The first (very) technical info I found:
It seems to be linked to the logic of relabeling aliases of the the filter clause in django.db.models.sql.query.Query.change_aliases
, in the self.where.relabel_aliases(change_map)
logic.
The first node itself is a WhereNode
containing (AND: (NOT (AND: <django.db.models.lookups.Exact object at 0x7f9ea9bfc9b0>)))
When going down the filter tree, we first hit again a WhereNode
with the negation: (NOT (AND: <django.db.models.lookups.Exact object at 0x7f9ea9bfc9b0>))
for our exclude clause.
Finally coming to the Exact
object, we relabel the left hand-side of the condition, being the StringAgg
.
This calls the django.db.models.expressions.BaseExpression.relabeled_clone
where we depend on the fields found by self.get_source_expressions()
Either, those source expressions are incomplete because of the way the OrderableAggMixin
plays with the self.get_source_expressions()
(not calling the super), or the change_map
which is passed down is missing some potential aliases.
I'll try to keep investigating and propose a patch in the coming weeks :)
comment:4 by , 5 years ago
Cc: | added |
---|
Did you manage to reproduce against the latest master
as well? It looks like it might have been fixed by commits related to #31094.
If it's still the case you seem to be heading in the right direction; there's likely an issue with OrderableAggMixin
and the use of filter
given how both require some special source expressions handling.
comment:5 by , 5 years ago
Yes, I do reproduce on the latest master
. I'll keep digging in that direction.
comment:7 by , 5 years ago
Patch needs improvement: | set |
---|
Left comments for improvement on the tests but the corrective seems good to me.
comment:8 by , 5 years ago
Patch needs improvement: | unset |
---|
Adjusted patch seems ready for a final review.
follow-up: 11 comment:10 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
In 2f565f84:
follow-up: 12 comment:11 by , 5 years ago
Replying to Mariusz Felisiak <felisiak.mariusz@…>:
In 2f565f84:
Can this fix be backported to 3.0.x and 2.2.x also, please?
follow-up: 13 comment:12 by , 5 years ago
Replying to John Fieldman:
Can this fix be backported to 3.0.x and 2.2.x also, please?
This patch doesn't qualify for a backport.
comment:13 by , 5 years ago
Replying to felixxm:
This patch doesn't qualify for a backport.
The issue is happening for 3.0.x and 2.2.x also, should it be solved differently for these versions, or won't be solved at all?
comment:14 by , 5 years ago
John, based on the investigation that lead to this patch the issue has been present since the feature was introduced.
Per our backporting policy this means it doesn't qualify for a backport to 3.0.x or 2.2.x anymore.
See https://docs.djangoproject.com/en/3.0/internals/release-process/ for more details.
comment:15 by , 5 years ago
It does seem that there was a regression in Django 2.2 to me.
The following query (using the models from above) works in Django 2.1.15 and master, but not 2.2.9 and 3.0.x:
from django.contrib.postgres.aggregates import StringAgg from django.db.models import CharField, OuterRef, Q, Subquery subquery = Book.objects.annotate( _annotated_value=StringAgg( 'people__first_name', ', ', filter=Q(people__first_name='Jane'), output_field=CharField(), ), ).filter( pk=OuterRef('pk'), ).values( '_annotated_value', ) queryset = Book.objects.annotate(names=Subquery(subquery)) print([book.names for book in queryset])
(I can file a separate bug for that if you want, though.)
comment:16 by , 5 years ago
Reupen, even if this was a regression in 2.2 this release series is only receiving security and data loss backport at this point.
Can you confirm this is the case Mariusz?
comment:17 by , 5 years ago
Yes it's a regression in 96199e562dcc409ab4bdc2b2146fa7cf73c7c5fe (Django 2.2) so when we added support for ordering
we also introduced a regression in filter
. Nevertheless Django 2.2 is already in extended support so this doesn't qualify for a backport.
comment:19 by , 5 years ago
Normally we backport only regression from the latest version (in the mainstream support), e.g. we'll not backport regressions introduced in Django < 3.0 to the Django 3.0, unless it's "critical".
comment:20 by , 5 years ago
Our documentation is not clear about that. As for me, a regression is a regression and as such should be backported to supported versions.
comment:21 by , 5 years ago
Do you really think that we should backport fixes for regressions created few versions ago? and treat them as release blockers. IMO backporting regressions from the latest version is a reasonable policy, i.e. if no one reported a regression in 9 months then it's not really crucial.
comment:22 by , 5 years ago
Tim beat me with a stick plenty of times when I was starting :) — Data loss bugs and security issues to all supported versions. Regressions and bugs in new features to current main version. I did think we should be more liberal but backporting itself isn't risk free, so the more I see it the more I think this is correct.
A change would be a change in policy though so we'd need to go via the discussion group and technical board I think.
comment:23 by , 5 years ago
Carlton, I'm not sure about your comment. Does "Regressions and bugs in new features to current main version." mean you also are in favor of a 3.0.x backport?
Mariusz is suggesting treating differently the case where a regression is found after a release cycle passed. I'm not in favor of this distinction.
comment:24 by , 5 years ago
Hi Claude. Good. I think "Regressions" here and in the supported versions docs isn't quite clear: it should say "Regressions introduced since the last release" — that's how it's been applied that I've seen, and I think it's also the historical intention. (Guess but, I suspect widening that might lead us to the land of several hundred release blockers.) I shall propose a small edit to the docs.
Having discussed with Mariusz, we're happy to backport this fix to 3.0. It's similar enough to #30315, which is in 3.0, and is a quite central use-case.
comment:25 by , 5 years ago
Carlton,
Guess but, I suspect widening that might lead us to the land of several hundred release blockers
I don't think so. Thanks to its extensive test suite and its wide usage, I think regressions from older versions are very rare in Django.
Good if this is backported to 3.0.x.
The fact the grouping conditions are surfaced to the outer query makes me believe it could be related to another issue that was addressed in 3.0. Could you try reproducing with Django 3.0 as well.