#35339 closed Bug (fixed)
Ordering and filtering a Postgres ArrayAgg with parameters inverts SQL param order
Reported by: | Chris M | Owned by: | Chris M |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | postgres arrayagg ordering |
Cc: | Mariusz Felisiak, Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When trying to build an ArrayAgg annotation that has a filter
with parameters and a ordering
with parameters, the SQL that is built will invert the parameters, putting the filter parameters before the ordering parameters. I was not able to find an existing ticket for this issue.
I have verified this against the current dev branch as well as version 3.2.
I was able to reproduce this in a test in the test_aggregates.py file.
def test_array_agg_filter_and_ordering_params(self): values = AggregateTestModel.objects.aggregate( arrayagg=ArrayAgg( "char_field", filter=Q(json_field__has_key="lang"), ordering=LPad(Cast("integer_field", CharField()), 2, Value("0")), ) ) self.assertEqual(values, {"arrayagg": ["Foo2", "Foo4"]})
The resulting error is something like
Traceback (most recent call last): File "/Users/camuthig/projects/oss/django/django/db/backends/utils.py", line 105, in _execute return self.cursor.execute(sql, params) psycopg2.errors.UndefinedFunction: function lpad(character varying, integer, integer) does not exist LINE 1: ...s_tests_aggregatetestmodel"."char_field" ORDER BY LPAD(("pos... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts.
The issue is that the result of the OrderableAggMixin.as_sql
function is
SQL:
ARRAY_AGG("postgres_tests_aggregatetestmodel"."char_field" ORDER BY LPAD(("postgres_tests_aggregatetestmodel"."integer_field")::varchar, %s, %s)) FILTER (WHERE "postgres_tests_aggregatetestmodel"."json_field" ? %s)
Parameters
[ "lang", 2, "0"]
So we are trying to use "lang" and 2 as the values for the ordering function, and "0" as the parameter for the filtering function. This is made a bit more confusing if the expression you are aggregating also has a parameter, because that should be before the ordering parameters. It should be
- Expression parameters
- Ordering parameters
- Filtering parameters
This happens because both the expression and filtering parameters come from the standard Aggregate parent class, and are then put in front of the ordering parameters in the Postgres-specific orderable mixin.
I have been able to resolve this issue locally by altering the OrderableAggMixin.as_sql
function to retrieve the parameters from the parent and then split them manually.
class OrderableAggMixin: # ... other functions def as_sql(self, compiler, connection): if self.order_by is not None: order_by_sql, order_by_params = compiler.compile(self.order_by) else: order_by_sql, order_by_params = "", () sql, expression_params = super().as_sql(compiler, connection, ordering=order_by_sql) filter_params = () if self.filter: try: _, filter_params = self.filter.as_sql(compiler, connection) expression_params = expression_params[:-len(filter_params)] except FullResultSet: pass return sql, (*expression_params, *order_by_params, *filter_params)
This solution technically works, but it feels a bit clunky, so I am open to suggestions on how to improve it. I can also create a pull request with the change if you would like to see it, though my changes are all captured here already.
Change History (15)
comment:1 by , 8 months ago
Cc: | added |
---|---|
Keywords: | postgres arrayagg ordering added |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
Version: | 5.0 → dev |
comment:2 by , 8 months ago
Thank you for the detailed report Chris and for the regression test Natalia this is very useful!
Since this a long standing issue and we thus don't have to deal with a backport I would suggest we bite the bullet and refactor things up to have OrderableAggMixin
merged in Aggregate
and have a flag similar to allow_distinct
.
The crux of the issue here is that we try be clever and have the get_source_expressions
/ set_source_expressions
signature change depending on whether or not a filter
and and order_by
are assigned (the len
of expressions will change) so we can satisfy list[Expression]
but with a variadic length. We tried doing that for a while with Window
but it became clear that using None
as placeholders for unspecified expressions was much easier to reason about. Embracing this approach made sure pretty much everything not expects get_expressions
to be list[Expression | None]
so it's a safe path to take now and doing so would avoids the conditional shenanigans that are causing out of order due to expressions.pop
and mixin in types.
Another big plus to having all the logic lives in Aggregate
is that it makes implementing things like STRING_AGG
(AKA GROUP_CONCAT
) and other expressions that gain support for ordering overnight much easier and tested in a generic way.
If that's something you're interesting in working on Chris I'd be happy to provide you reviewing support.
comment:3 by , 8 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I think I see what you are going for, Simon. I'll give that a go and send up a pull request once I have something to share.
comment:4 by , 8 months ago
Sounds good! I suggest you start by making both Aggregate
and OrderableAggMixin
unconditionally return None
in their source_expressions
getters and setters when they respectively don't have any filter
and ordering
.
This should address the immediate issue reported in this ticket and from there, if you're feeling confident, merging OrderableAggMixin
and dealing with its deprecation could be tackled in a follow up ticket.
comment:5 by , 8 months ago
Has patch: | set |
---|
comment:6 by , 8 months ago
Patch needs improvement: | set |
---|
Left some comments on the PR for some subtle adjustments to get the tests passing but it's looking promising to me.
comment:7 by , 8 months ago
Patch needs improvement: | unset |
---|
Patch is looking good to me albeit some commit message massaging that mergers should be able to handle.
comment:8 by , 7 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:12 by , 7 months ago
Simon, I am interested in working on the second phase of merging in OrderableAggMixin
. I started playing around with this code, and I think it makes sense. My initial question is around how moving this up looks with regards to things like ArrayAgg
. That function directly exists in other databases, like Sqlite, so would we want to put it into the core aggregate module instead of the postgres module? To build on that, MySQL implements basically the same behavior as group_concat
, which Sqlite also supports, so is there some nuance to how that should be handled as well?
Also, should we create a new ticket for this second phase of the work? What would that look like?
comment:13 by , 7 months ago
Hello Chris, glad to hear you're interested in moving this forward! We should definitely create a new ticket for that.
I'd suggest targeting StringAgg(Aggregate)
and default to using STRING_AGG
(Postgres, SQLite), GROUP_CONCAT
as fallback on MySQL, and LIST_AGG WITHIN GROUP (ORDER BY)
on Oracle.
As for whether StringAgg
should live in models.aggregate
(or only in tests at first) I think it could even if that means documenting it. Another potential candidate that could be added is JSONArrayAgg
but I think we should focus on a single one for this first ticket.
A few things I believe we should focus on along the way
- We should make the argument
Aggregate(order_by)
instead ofordering
to be consistent withWindow
- Just like we do with
allow_distinct
we should use aallow_order_by
attribute for aggregates that support this feature - We should make
OrderableAggMixin
a shim that setsallow_order_by=True
, redirect__init__(ordering)
toorder_by
, and emit a deprecation warning when doing so to allow a proper transition forcontrib.postgres.ArrayAgg
and friends - We should reuse
OrderByList
as much as possible
If that can be of any help I hacked a bit on it a few days ago to make sure I didn't send through a rabbit hole by requesting these.
comment:14 by , 6 months ago
Thanks, Simon. You are right, I mentioned ArrayAgg
but I should have been referencing the StringAgg
function. I appreciate the input and gut-check commit. The commit seems to answer a many of the early questions I had cropping up internally as I started playing around with the concept around the best way to structure the code and handle some of the feature checks.
I take a crack at creating a new ticket based on your input here and start working on it.
Hello Chris, thank you for your detailed report.
I can confirm that the provided test fails in current
main
as shown below. Adding Simon and Mariusz as cc to see if they can provide advice on thetests/postgres_tests/test_aggregates.py
SubstrFull trace with
--debug-sql
: