Opened 3 weeks ago

Closed 8 days ago

#36890 closed New feature (fixed)

Allow providing distinct=True to StringAgg on SQLite by specifying the default delimiter

Reported by: AJ Slater Owned by: Varun Kasyap Pentamaraju
Component: Database layer (models, ORM) Version: 6.0
Severity: Normal Keywords: Aggregate, sqlite3, StringAgg
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Attempting to use the new generic StringAgg with distinct=True raises exceptions because any Aggregate with multiple parameters cannot be distinct with an sqlite3 connection. Running StringAgg without distinct produces many duplicate entries aggregated together.

However this Aggregate runs fine under sqlite3, allows distinct and does not produce duplicates.

class GroupConcat(Aggregate):
    """Sqlite3 GROUP_CONCAT."""

    # Defaults to " " separator which is all I need for now.

    allow_distinct = True
    allow_order_by = True
    function = "GROUP_CONCAT"
    name = "GroupConcat"

    def __init__(self, *args, **kwargs):
        """output_field is set in the constructor."""
        super().__init__(*args, output_field=CharField(), **kwargs)

It occurs to me that this may happen because delimiter can be an expression. If delimiter is submitted as some invariant like Value() it would be nice to escape this limitation.

Change History (10)

comment:1 by Kundan Yadav, 3 weeks ago

can i it assigned to me if it get's reviewed

comment:2 by Simon Charette, 3 weeks ago

I'm not sure I understand your report here.

The code you provided states that " " is the default separator for GROUP_CONCAT but the documentation says otherwise.

The group_concat() function returns a string which is the concatenation of all non-NULL values of X. If parameter Y is present then it is used as the separator between instances of X. A comma (",") is used as the separator if Y is omitted.

which I was able to confirm with

SQLite version 3.37.0 2021-12-09 01:34:53
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE ticket_36890 (value text);
sqlite> INSERT INTO ticket_36890 (value) VALUES ('foo'), ('bar'), ('foo');
sqlite> SELECT GROUP_CONCAT(value), GROUP_CONCAT(DISTINCT value) FROM ticket_36890;
foo,bar,foo|foo,bar

which means that we could support StringAgg("field", Value(","), distinct=True) on SQLite but not StringAgg("field", Value(" "), distinct=True)

Version 1, edited 3 weeks ago by Simon Charette (previous) (next) (diff)

comment:3 by Jacob Walls, 3 weeks ago

Keywords: StringAgg added
Summary: StringAgg doesn't allow DISTINCT in sqlite3 while native GROUP_CONCAT doesAllow providing distinct=True to StringAgg on SQLite by specifying the default delimiter
Triage Stage: UnreviewedAccepted
Type: UncategorizedNew feature

Seems reasonable, we should be able to drop the delimiter if self.delimiter.value == models.Value(",") similar to what the MySQL code path does:

            # Drop the delimiter from the source expressions.
            c.source_expressions = c.source_expressions[:-1]

That should evade the check:

        if (
            self.distinct
            and not connection.features.supports_aggregate_distinct_multiple_argument
            and len(super().get_source_expressions()) > 1
        ):

comment:4 by Varun Kasyap Pentamaraju, 3 weeks ago

Owner: set to Varun Kasyap Pentamaraju
Status: newassigned

I'm interested to submit a patch

Last edited 3 weeks ago by Varun Kasyap Pentamaraju (previous) (diff)

comment:5 by AJ Slater, 3 weeks ago

Thanks Simon, for catching my mistake. The default is not " ", it's actually ",".

Jacob's proposed solution would work with the default case.

comment:6 by Varun Kasyap Pentamaraju, 2 weeks ago

Has patch: set
Last edited 2 weeks ago by Varun Kasyap Pentamaraju (previous) (diff)

comment:7 by Simon Charette, 2 weeks ago

Patch needs improvement: set

The added test is failing on everything but SQLite as it relies on the dataabase string representation of floats which differs between backends.

It also lacks support for ORDER BY and FILTER and coudl simplify it's eligibility heuristic logic.

comment:8 by Jacob Walls, 13 days ago

Needs documentation: set
Patch needs improvement: unset

comment:9 by Jacob Walls, 9 days ago

Needs documentation: unset
Triage Stage: AcceptedReady for checkin

comment:10 by Jacob Walls <jacobtylerwalls@…>, 8 days ago

Resolution: fixed
Status: assignedclosed

In 3282d9f:

Fixed #36890 -- Supported StringAgg(distinct=True) on SQLite with the default delimiter.

Note: See TracTickets for help on using tickets.
Back to Top