Opened 7 months ago

Closed 6 months ago

Last modified 4 months ago

#35339 closed Bug (fixed)

Ordering and filtering a Postgres ArrayAgg with parameters inverts SQL param order

Reported by: Chris M Owned by: Chris M
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: postgres arrayagg ordering
Cc: Mariusz Felisiak, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When trying to build an ArrayAgg annotation that has a filter with parameters and a ordering with parameters, the SQL that is built will invert the parameters, putting the filter parameters before the ordering parameters. I was not able to find an existing ticket for this issue.

I have verified this against the current dev branch as well as version 3.2.

I was able to reproduce this in a test in the test_aggregates.py file.

    def test_array_agg_filter_and_ordering_params(self):
        values = AggregateTestModel.objects.aggregate(
            arrayagg=ArrayAgg(
                "char_field",
                filter=Q(json_field__has_key="lang"),
                ordering=LPad(Cast("integer_field", CharField()), 2, Value("0")),
            )
        )
        self.assertEqual(values, {"arrayagg": ["Foo2", "Foo4"]})

The resulting error is something like

Traceback (most recent call last):
  File "/Users/camuthig/projects/oss/django/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedFunction: function lpad(character varying, integer, integer) does not exist
LINE 1: ...s_tests_aggregatetestmodel"."char_field" ORDER BY LPAD(("pos...
                                                             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

The issue is that the result of the OrderableAggMixin.as_sql function is

SQL:

ARRAY_AGG("postgres_tests_aggregatetestmodel"."char_field" ORDER BY LPAD(("postgres_tests_aggregatetestmodel"."integer_field")::varchar, %s, %s)) FILTER (WHERE "postgres_tests_aggregatetestmodel"."json_field" ? %s)

Parameters

[ "lang", 2, "0"]

So we are trying to use "lang" and 2 as the values for the ordering function, and "0" as the parameter for the filtering function. This is made a bit more confusing if the expression you are aggregating also has a parameter, because that should be before the ordering parameters. It should be

  1. Expression parameters
  2. Ordering parameters
  3. Filtering parameters

This happens because both the expression and filtering parameters come from the standard Aggregate parent class, and are then put in front of the ordering parameters in the Postgres-specific orderable mixin.

I have been able to resolve this issue locally by altering the OrderableAggMixin.as_sql function to retrieve the parameters from the parent and then split them manually.

class OrderableAggMixin:
    # ... other functions

    def as_sql(self, compiler, connection):
        if self.order_by is not None:
            order_by_sql, order_by_params = compiler.compile(self.order_by)
        else:
            order_by_sql, order_by_params = "", ()

        sql, expression_params = super().as_sql(compiler, connection, ordering=order_by_sql)

        filter_params = ()
        if self.filter:
            try:
                _, filter_params = self.filter.as_sql(compiler, connection)
                expression_params = expression_params[:-len(filter_params)]
            except FullResultSet:
                pass

        return sql, (*expression_params, *order_by_params, *filter_params)

This solution technically works, but it feels a bit clunky, so I am open to suggestions on how to improve it. I can also create a pull request with the change if you would like to see it, though my changes are all captured here already.

Change History (15)

comment:1 by Natalia Bidart, 7 months ago

Cc: Mariusz Felisiak Simon Charette added
Keywords: postgres arrayagg ordering added
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug
Version: 5.0dev

Hello Chris, thank you for your detailed report.

I can confirm that the provided test fails in current main as shown below. Adding Simon and Mariusz as cc to see if they can provide advice on the

  • tests/postgres_tests/test_aggregates.py

    diff --git a/tests/postgres_tests/test_aggregates.py b/tests/postgres_tests/test_aggregates.py
    index 386c55da25..5ab27752d1 100644
    a b from django.db.models import (  
    1212    Window,
    1313)
    1414from django.db.models.fields.json import KeyTextTransform, KeyTransform
    15 from django.db.models.functions import Cast, Concat, Substr
     15from django.db.models.functions import Cast, Concat, LPad, Substr
    1616from django.test import skipUnlessDBFeature
    1717from django.test.utils import Approximate
    1818from django.utils import timezone
    class TestGeneralAggregate(PostgreSQLTestCase):  
    188188                )
    189189                self.assertEqual(values, {"arrayagg": expected_output})
    190190
     191    def test_array_agg_filter_and_ordering_params(self):
     192        values = AggregateTestModel.objects.aggregate(
     193            arrayagg=ArrayAgg(
     194                "char_field",
     195                filter=Q(json_field__has_key="lang"),
     196                ordering=LPad(Cast("integer_field", CharField()), 2, Value("0")),
     197            )
     198        )
     199        self.assertEqual(values, {"arrayagg": ["Foo2", "Foo4"]})
     200
    191201    def test_array_agg_integerfield(self):
    192202        values = AggregateTestModel.objects.aggregate(
    193203            arrayagg=ArrayAgg("integer_field")

Full trace with --debug-sql:

======================================================================
ERROR: test_array_agg_filter_and_ordering_params (postgres_tests.test_aggregates.TestGeneralAggregate.test_array_agg_filter_and_ordering_params)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/nessita/fellowship/django/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nessita/.virtualenvs/djangodev/lib/python3.11/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)
psycopg.errors.UndefinedFunction: function lpad(character varying, unknown, integer) does not exist
LINE 1: ...s_tests_aggregatetestmodel"."char_field" ORDER BY LPAD(("pos...
                                                             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
  File "/home/nessita/.virtualenvs/djangodev/lib/python3.11/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)
django.db.utils.ProgrammingError: function lpad(character varying, unknown, integer) does not exist
LINE 1: ...s_tests_aggregatetestmodel"."char_field" ORDER BY LPAD(("pos...
                                                             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

----------------------------------------------------------------------
(0.000)
SELECT ARRAY_AGG("postgres_tests_aggregatetestmodel"."char_field"
                 ORDER BY LPAD(("postgres_tests_aggregatetestmodel"."integer_field")::varchar, 'lang', 2)) FILTER (
                                                                                                                   WHERE "postgres_tests_aggregatetestmodel"."json_field" ? '0') AS "arrayagg"
FROM "postgres_tests_aggregatetestmodel";

args=('lang',
      Int4(2),
      '0');

ALIAS=DEFAULT

comment:2 by Simon Charette, 7 months ago

Thank you for the detailed report Chris and for the regression test Natalia this is very useful!

Since this a long standing issue and we thus don't have to deal with a backport I would suggest we bite the bullet and refactor things up to have OrderableAggMixin merged in Aggregate and have a flag similar to allow_distinct.

The crux of the issue here is that we try be clever and have the get_source_expressions / set_source_expressions signature change depending on whether or not a filter and and order_by are assigned (the len of expressions will change) so we can satisfy list[Expression] but with a variadic length. We tried doing that for a while with Window but it became clear that using None as placeholders for unspecified expressions was much easier to reason about. Embracing this approach made sure pretty much everything not expects get_expressions to be list[Expression | None] so it's a safe path to take now and doing so would avoids the conditional shenanigans that are causing out of order due to expressions.pop and mixin in types.

Another big plus to having all the logic lives in Aggregate is that it makes implementing things like STRING_AGG (AKA GROUP_CONCAT) and other expressions that gain support for ordering overnight much easier and tested in a generic way.

If that's something you're interesting in working on Chris I'd be happy to provide you reviewing support.

Last edited 7 months ago by Simon Charette (previous) (diff)

comment:3 by Chris M, 7 months ago

Owner: changed from nobody to Chris M
Status: newassigned

I think I see what you are going for, Simon. I'll give that a go and send up a pull request once I have something to share.

comment:4 by Simon Charette, 7 months ago

Sounds good! I suggest you start by making both Aggregate and OrderableAggMixin unconditionally return None in their source_expressions getters and setters when they respectively don't have any filter and ordering.

This should address the immediate issue reported in this ticket and from there, if you're feeling confident, merging OrderableAggMixin and dealing with its deprecation could be tackled in a follow up ticket.

comment:5 by Chris M, 7 months ago

Has patch: set

comment:6 by Simon Charette, 7 months ago

Patch needs improvement: set

Left some comments on the PR for some subtle adjustments to get the tests passing but it's looking promising to me.

comment:7 by Simon Charette, 7 months ago

Patch needs improvement: unset

Patch is looking good to me albeit some commit message massaging that mergers should be able to handle.

comment:8 by Natalia Bidart, 6 months ago

Triage Stage: AcceptedReady for checkin

comment:9 by nessita <124304+nessita@…>, 6 months ago

In 42b567ab:

Refs #35339 -- Updated Aggregate class to return consistent source expressions.

Refactored the filter and order_by expressions in the Aggregate class to
return a list of Expression (or None) values, ensuring that the list
item is always available and represents the filter expression.
For the PostgreSQL OrderableAggMixin, the returned list will always
include the filter and the order_by value as the last two elements.

Lastly, emtpy Q objects passed directly into aggregate objects using
Aggregate.filter in admin facets are filtered out when resolving the
expression to avoid errors in get_refs().

Thanks Simon Charette for the review.

comment:10 by nessita <124304+nessita@…>, 6 months ago

Resolution: fixed
Status: assignedclosed

In c8df2f99:

Fixed #35339 -- Fixed PostgreSQL aggregate's filter and order_by params order.

Updated OrderableAggMixin.as_sql() to separate the order_by parameters
from the filter parameters. Previously, the parameters and SQL were
calculated by the Aggregate parent class, resulting in a mixture of
order_by and filter parameters.

Thanks Simon Charette for the review.

comment:11 by GitHub <noreply@…>, 6 months ago

In 8c257cec:

Refs #35339 -- Fixed source expressions in GeoAggregate on Oracle.

Regression in 42b567ab4c5bfb1bbd3e629b1079271c5ae44ea0.

comment:12 by Chris M, 6 months ago

Simon, I am interested in working on the second phase of merging in OrderableAggMixin . I started playing around with this code, and I think it makes sense. My initial question is around how moving this up looks with regards to things like ArrayAgg. That function directly exists in other databases, like Sqlite, so would we want to put it into the core aggregate module instead of the postgres module? To build on that, MySQL implements basically the same behavior as group_concat, which Sqlite also supports, so is there some nuance to how that should be handled as well?

Also, should we create a new ticket for this second phase of the work? What would that look like?

comment:13 by Simon Charette, 6 months ago

Hello Chris, glad to hear you're interested in moving this forward! We should definitely create a new ticket for that.

I'd suggest targeting StringAgg(Aggregate) and default to using STRING_AGG (Postgres, SQLite), GROUP_CONCAT as fallback on MySQL, and LIST_AGG WITHIN GROUP (ORDER BY) on Oracle.

As for whether StringAgg should live in models.aggregate (or only in tests at first) I think it could even if that means documenting it. Another potential candidate that could be added is JSONArrayAgg but I think we should focus on a single one for this first ticket.

A few things I believe we should focus on along the way

  1. We should make the argument Aggregate(order_by) instead of ordering to be consistent with Window
  2. Just like we do with allow_distinct we should use a allow_order_by attribute for aggregates that support this feature
  3. We should make OrderableAggMixin a shim that sets allow_order_by=True, redirect __init__(ordering) to order_by, and emit a deprecation warning when doing so to allow a proper transition for contrib.postgres.ArrayAgg and friends
  4. We should reuse OrderByList as much as possible

If that can be of any help I hacked a bit on it a few days ago to make sure I didn't send through a rabbit hole by requesting these.

Last edited 6 months ago by Simon Charette (previous) (diff)

comment:14 by Chris M, 6 months ago

Thanks, Simon. You are right, I mentioned ArrayAgg but I should have been referencing the StringAgg function. I appreciate the input and gut-check commit. The commit seems to answer a many of the early questions I had cropping up internally as I started playing around with the concept around the best way to structure the code and handle some of the feature checks.

I take a crack at creating a new ticket based on your input here and start working on it.

comment:15 by Natalia Bidart, 4 months ago

#35613 was a duplicate.

Note: See TracTickets for help on using tickets.
Back to Top