Opened 4 weeks ago

Closed 2 days ago

Last modified 29 hours ago

#35339 closed Bug (fixed)

Ordering and filtering a Postgres ArrayAgg with parameters inverts SQL param order

Reported by: Chris M Owned by: Chris M
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: postgres arrayagg ordering
Cc: Mariusz Felisiak, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When trying to build an ArrayAgg annotation that has a filter with parameters and a ordering with parameters, the SQL that is built will invert the parameters, putting the filter parameters before the ordering parameters. I was not able to find an existing ticket for this issue.

I have verified this against the current dev branch as well as version 3.2.

I was able to reproduce this in a test in the test_aggregates.py file.

    def test_array_agg_filter_and_ordering_params(self):
        values = AggregateTestModel.objects.aggregate(
            arrayagg=ArrayAgg(
                "char_field",
                filter=Q(json_field__has_key="lang"),
                ordering=LPad(Cast("integer_field", CharField()), 2, Value("0")),
            )
        )
        self.assertEqual(values, {"arrayagg": ["Foo2", "Foo4"]})

The resulting error is something like

Traceback (most recent call last):
  File "/Users/camuthig/projects/oss/django/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedFunction: function lpad(character varying, integer, integer) does not exist
LINE 1: ...s_tests_aggregatetestmodel"."char_field" ORDER BY LPAD(("pos...
                                                             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

The issue is that the result of the OrderableAggMixin.as_sql function is

SQL:

ARRAY_AGG("postgres_tests_aggregatetestmodel"."char_field" ORDER BY LPAD(("postgres_tests_aggregatetestmodel"."integer_field")::varchar, %s, %s)) FILTER (WHERE "postgres_tests_aggregatetestmodel"."json_field" ? %s)

Parameters

[ "lang", 2, "0"]

So we are trying to use "lang" and 2 as the values for the ordering function, and "0" as the parameter for the filtering function. This is made a bit more confusing if the expression you are aggregating also has a parameter, because that should be before the ordering parameters. It should be

  1. Expression parameters
  2. Ordering parameters
  3. Filtering parameters

This happens because both the expression and filtering parameters come from the standard Aggregate parent class, and are then put in front of the ordering parameters in the Postgres-specific orderable mixin.

I have been able to resolve this issue locally by altering the OrderableAggMixin.as_sql function to retrieve the parameters from the parent and then split them manually.

class OrderableAggMixin:
    # ... other functions

    def as_sql(self, compiler, connection):
        if self.order_by is not None:
            order_by_sql, order_by_params = compiler.compile(self.order_by)
        else:
            order_by_sql, order_by_params = "", ()

        sql, expression_params = super().as_sql(compiler, connection, ordering=order_by_sql)

        filter_params = ()
        if self.filter:
            try:
                _, filter_params = self.filter.as_sql(compiler, connection)
                expression_params = expression_params[:-len(filter_params)]
            except FullResultSet:
                pass

        return sql, (*expression_params, *order_by_params, *filter_params)

This solution technically works, but it feels a bit clunky, so I am open to suggestions on how to improve it. I can also create a pull request with the change if you would like to see it, though my changes are all captured here already.

Change History (11)

comment:1 by Natalia Bidart, 4 weeks ago

Cc: Mariusz Felisiak Simon Charette added
Keywords: postgres arrayagg ordering added
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug
Version: 5.0dev

Hello Chris, thank you for your detailed report.

I can confirm that the provided test fails in current main as shown below. Adding Simon and Mariusz as cc to see if they can provide advice on the

  • tests/postgres_tests/test_aggregates.py

    diff --git a/tests/postgres_tests/test_aggregates.py b/tests/postgres_tests/test_aggregates.py
    index 386c55da25..5ab27752d1 100644
    a b from django.db.models import (  
    1212    Window,
    1313)
    1414from django.db.models.fields.json import KeyTextTransform, KeyTransform
    15 from django.db.models.functions import Cast, Concat, Substr
     15from django.db.models.functions import Cast, Concat, LPad, Substr
    1616from django.test import skipUnlessDBFeature
    1717from django.test.utils import Approximate
    1818from django.utils import timezone
    class TestGeneralAggregate(PostgreSQLTestCase):  
    188188                )
    189189                self.assertEqual(values, {"arrayagg": expected_output})
    190190
     191    def test_array_agg_filter_and_ordering_params(self):
     192        values = AggregateTestModel.objects.aggregate(
     193            arrayagg=ArrayAgg(
     194                "char_field",
     195                filter=Q(json_field__has_key="lang"),
     196                ordering=LPad(Cast("integer_field", CharField()), 2, Value("0")),
     197            )
     198        )
     199        self.assertEqual(values, {"arrayagg": ["Foo2", "Foo4"]})
     200
    191201    def test_array_agg_integerfield(self):
    192202        values = AggregateTestModel.objects.aggregate(
    193203            arrayagg=ArrayAgg("integer_field")

Full trace with --debug-sql:

======================================================================
ERROR: test_array_agg_filter_and_ordering_params (postgres_tests.test_aggregates.TestGeneralAggregate.test_array_agg_filter_and_ordering_params)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/nessita/fellowship/django/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nessita/.virtualenvs/djangodev/lib/python3.11/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)
psycopg.errors.UndefinedFunction: function lpad(character varying, unknown, integer) does not exist
LINE 1: ...s_tests_aggregatetestmodel"."char_field" ORDER BY LPAD(("pos...
                                                             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
  File "/home/nessita/.virtualenvs/djangodev/lib/python3.11/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)
django.db.utils.ProgrammingError: function lpad(character varying, unknown, integer) does not exist
LINE 1: ...s_tests_aggregatetestmodel"."char_field" ORDER BY LPAD(("pos...
                                                             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

----------------------------------------------------------------------
(0.000)
SELECT ARRAY_AGG("postgres_tests_aggregatetestmodel"."char_field"
                 ORDER BY LPAD(("postgres_tests_aggregatetestmodel"."integer_field")::varchar, 'lang', 2)) FILTER (
                                                                                                                   WHERE "postgres_tests_aggregatetestmodel"."json_field" ? '0') AS "arrayagg"
FROM "postgres_tests_aggregatetestmodel";

args=('lang',
      Int4(2),
      '0');

ALIAS=DEFAULT

comment:2 by Simon Charette, 4 weeks ago

Thank you for the detailed report Chris and for the regression test Natalia this is very useful!

Since this a long standing issue and we thus don't have to deal with a backport I would suggest we bite the bullet and refactor things up to have OrderableAggMixin merged in Aggregate and have a flag similar to allow_distinct.

The crux of the issue here is that we try be clever and have the get_source_expressions / set_source_expressions signature change depending on whether or not a filter and and order_by are assigned (the len of expressions will change) so we can satisfy list[Expression] but with a variadic length. We tried doing that for a while with Window but it became clear that using None as placeholders for unspecified expressions was much easier to reason about. Embracing this approach made sure pretty much everything not expects get_expressions to be list[Expression | None] so it's a safe path to take now and doing so would avoids the conditional shenanigans that are causing out of order due to expressions.pop and mixin in types.

Another big plus to having all the logic lives in Aggregate is that it makes implementing things like STRING_AGG (AKA GROUP_CONCAT) and other expressions that gain support for ordering overnight much easier and tested in a generic way.

If that's something you're interesting in working on Chris I'd be happy to provide you reviewing support.

Last edited 4 weeks ago by Simon Charette (previous) (diff)

comment:3 by Chris M, 4 weeks ago

Owner: changed from nobody to Chris M
Status: newassigned

I think I see what you are going for, Simon. I'll give that a go and send up a pull request once I have something to share.

comment:4 by Simon Charette, 4 weeks ago

Sounds good! I suggest you start by making both Aggregate and OrderableAggMixin unconditionally return None in their source_expressions getters and setters when they respectively don't have any filter and ordering.

This should address the immediate issue reported in this ticket and from there, if you're feeling confident, merging OrderableAggMixin and dealing with its deprecation could be tackled in a follow up ticket.

comment:5 by Chris M, 3 weeks ago

Has patch: set

comment:6 by Simon Charette, 3 weeks ago

Patch needs improvement: set

Left some comments on the PR for some subtle adjustments to get the tests passing but it's looking promising to me.

comment:7 by Simon Charette, 3 weeks ago

Patch needs improvement: unset

Patch is looking good to me albeit some commit message massaging that mergers should be able to handle.

comment:8 by Natalia Bidart, 2 days ago

Triage Stage: AcceptedReady for checkin

comment:9 by nessita <124304+nessita@…>, 2 days ago

In 42b567ab:

Refs #35339 -- Updated Aggregate class to return consistent source expressions.

Refactored the filter and order_by expressions in the Aggregate class to
return a list of Expression (or None) values, ensuring that the list
item is always available and represents the filter expression.
For the PostgreSQL OrderableAggMixin, the returned list will always
include the filter and the order_by value as the last two elements.

Lastly, emtpy Q objects passed directly into aggregate objects using
Aggregate.filter in admin facets are filtered out when resolving the
expression to avoid errors in get_refs().

Thanks Simon Charette for the review.

comment:10 by nessita <124304+nessita@…>, 2 days ago

Resolution: fixed
Status: assignedclosed

In c8df2f99:

Fixed #35339 -- Fixed PostgreSQL aggregate's filter and order_by params order.

Updated OrderableAggMixin.as_sql() to separate the order_by parameters
from the filter parameters. Previously, the parameters and SQL were
calculated by the Aggregate parent class, resulting in a mixture of
order_by and filter parameters.

Thanks Simon Charette for the review.

comment:11 by GitHub <noreply@…>, 29 hours ago

In 8c257cec:

Refs #35339 -- Fixed source expressions in GeoAggregate on Oracle.

Regression in 42b567ab4c5bfb1bbd3e629b1079271c5ae44ea0.

Note: See TracTickets for help on using tickets.
Back to Top