Opened 10 years ago

Last modified 2 weeks ago

#26434 new Bug

Inconsistent results of QuerySet.count() when ordering is not a subset of explicit grouping.

Reported by: kamandol Owned by: Michal Mládek
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: postgresql queryset count annotate aggreagate order_by
Cc: github@…, Simon Charette Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using a PostgreSQL backend database, in some cases with QuerySets involving annotations, values or values_list and ordering(order_by) clauses the resulting QuerySet count() method fails to predict the real row result unless that QuerySet is evaluated.

For instance:

q = BundleVersion.objects.order_by('-id').values_list("port__id", "asset_bundle__id").annotate(max=models.Max("id"))

This QuerySet using PostgreSQL would, in fact, group the results by BundleVersion.id instead of the tuple ("port__id", "asset_bundle__id") due to the order_by clause using a column not in the values_list. This is a documented behavior.

The problem is that if q is not evaluated yet, calling q.count() will return an amount of rows as if the grouping was done on the tuple ("port__id", "asset_bundle__id"). After evaluating q, or calling len(q), the result of q.count() will be the expected, as if the grouping was done using 'BundleVersion.id'.

Attachments (2)

main_logic.patch (1.4 KB ) - added by Michal Mládek 6 months ago.
main_logic.2.patch (2.4 KB ) - added by Michal Mládek 6 months ago.

Download all attachments as: .zip

Change History (39)

comment:1 by Tim Graham, 10 years ago

Could you provide a test for Django's test suite in tests/queries/tests.py that demonstrates the issue?

in reply to:  1 comment:2 by kamandol, 10 years ago

Replying to timgraham:

Could you provide a test for Django's test suite in tests/queries/tests.py that demonstrates the issue?

Here is the commit with the demonstration:

https://github.com/sp-ricard-valverde/django/commit/c50afafdb02880e4c941c2a72a215bee80de3aed

comment:3 by Iacopo Spalletti, 10 years ago

Tests looks legitimate, but I don't get this one https://github.com/sp-ricard-valverde/django/commit/c50afafdb02880e4c941c2a72a215bee80de3aed#diff-ce9b52a66d03e851a9828377263dc04bR3864 ; or rather I don't get why is labeled as 'OK' as it the concatenation of the previous ones

comment:4 by Iacopo Spalletti, 10 years ago

Cc: github@… added

in reply to:  3 comment:5 by kamandol, 10 years ago

Replying to yakky:

Tests looks legitimate, but I don't get this one https://github.com/sp-ricard-valverde/django/commit/c50afafdb02880e4c941c2a72a215bee80de3aed#diff-ce9b52a66d03e851a9828377263dc04bR3864 ; or rather I don't get why is labeled as 'OK' as it the concatenation of the previous ones

Sorry, my bad, it's a bad choice for a test title. What it demonstrates is that the second assert in that test is successful even though a new QuerySet is built and, according to the other failed tests it should fail too.

comment:7 by Tim Graham, 10 years ago

Triage Stage: UnreviewedAccepted

If you could put the tests on a branch of you fork other than "master" that seems a bit safer. There is probably no need for skipUnlessDBEngine - is there a reason the queries shouldn't also work on other databases? Anyway, the usual way to skip tests is based on database features.

comment:8 by kamandol, 10 years ago

I will change the branch as requested.

You're right, the faulty behavior was detected on PostgreSQL, but on MySQL worked correctly so the test should pass. Could not test other db engines though. I will remove that extra piece of code.

comment:9 by Can Sarıgöl, 6 years ago

Has patch: set
Last edited 6 years ago by Can Sarıgöl (previous) (diff)

comment:10 by Mariusz Felisiak, 6 years ago

Has patch: unset
Version: 1.9master

comment:11 by Can Sarıgöl, 6 years ago

Last edited 6 years ago by Can Sarıgöl (previous) (diff)

comment:12 by Can Sarıgöl, 6 years ago

Has patch: set

comment:13 by Mariusz Felisiak, 6 years ago

Patch needs improvement: set

comment:14 by Michal Mládek, 6 months ago

Owner: changed from nobody to Michal Mládek
Status: newassigned

comment:15 by Simon Charette, 6 months ago

Summary: Inconsistent results of QuerySet count() method using PostgreSQL backend prior and post the QuerySet evaluationInconsistent results of QuerySet.count() when ordering is not a subset of explicit grouping.

FWIW the issue is not Postgres specific (I reproduced against SQLite).

The problem appears to be caused by sql.Query.get_aggregation's call to clear_ordering(force=False). It seems like the later should skip clearing when isinstance(self.group_by, tuple) and self.order_by + self.extra_order_by is not a subset of self.group_by as force=False must only clear if doing so preserves the semantic of the query which clearly doesn't if the ordering includes members missing from the group by (as they would normally be added at a later stage).

I adjusted the ticket summary accordingly.

Last edited 6 months ago by Simon Charette (previous) (diff)

in reply to:  15 ; comment:16 by Michal Mládek, 6 months ago

Hello Simon,

Thank you very much for clarifying the bug — especially for pointing out that it’s not Postgres-specific, and also for the hint about a subset. I think everything will go much more smoothly and quickly now.

Replying to Simon Charette:

FWIW the issue is not Postgres specific (I reproduced against SQLite).

The problem appears to be caused by sql.Query.get_aggregation's call to clear_ordering(force=False). It seems like the later should skip clearing when isinstance(self.group_by, tuple) and self.order_by + self.extra_order_by is not a subset of self.group_by as force=False must only clear if doing so preserves the semantic of the query which clearly doesn't if the ordering includes members missing from the group by (as they would normally be added at a later stage).

I adjusted the ticket summary accordingly.

in reply to:  16 comment:17 by Michal Mládek, 6 months ago

Replying to Simon Charette:

Using order_by('...') on a column that is also used implicitly for grouping in an aggregation queryset is a misuse of the ORM AFAIK. So what should we do about it?

Should we implement a patch that triggers a warning if order_by('...') includes a column that ends up in the annotate(...), I mean GROUP BY ... clause? Or should we instead modify the behavior of queryset.count() to preserve ordering in such edge cases?

In case of implementing a warning, this doesn't look like a bug - it would be better treated as a feature request, I believe.

Also, the documentation already warns about this interaction quite clearly:
https://docs.djangoproject.com/en/5.2/topics/db/aggregation/#interaction-with-order-by — and that warning has been present since the LTS version Django 3.2.

Hello Simon,

Thank you very much for clarifying the bug — especially for pointing out that it’s not Postgres-specific, and also for the hint about a subset. I think everything will go much more smoothly and quickly now.

Replying to Simon Charette:

FWIW the issue is not Postgres specific (I reproduced against SQLite).

The problem appears to be caused by sql.Query.get_aggregation's call to clear_ordering(force=False). It seems like the later should skip clearing when isinstance(self.group_by, tuple) and self.order_by + self.extra_order_by is not a subset of self.group_by as force=False must only clear if doing so preserves the semantic of the query which clearly doesn't if the ordering includes members missing from the group by (as they would normally be added at a later stage).

I adjusted the ticket summary accordingly.

comment:18 by Simon Charette, 6 months ago

Using order_by('...') on a column that is also used implicitly for grouping in an aggregation queryset is a misuse of the ORM AFAIK. So what should we do about it?

I share your sentiment that a warning should be emitted but that likely warrant a separate ticket inspired by #14357, #32546 (there might be one already I just can't find it now).

I think we should keep this one focused on making sure the right results are returned in the mean time as it's caused by an over-eager optimization (pruning an ORDER BY for performance reasons).

The way things are currently setup the not quite correct example is wrong as the order_by("name") is elided when count() is called but not when the full query is materialized.

The example you linked is about aggregate function annotations but you'll notice that the gotcha is not honored when aggregation is used on top of the query.

In other words, we should make the gotcha more explicit but we should first make sure it's coherent if aggregation is used over a queryset making use of aggregate annotations.

Last edited 6 months ago by Simon Charette (previous) (diff)

comment:19 by Simon Charette, 6 months ago

Cc: Simon Charette added

comment:20 by Michal Mládek, 6 months ago

I added patch in attachment, I am still working on in it. I think I am one step closer to the solution.

by Michal Mládek, 6 months ago

Attachment: main_logic.patch added

by Michal Mládek, 6 months ago

Attachment: main_logic.2.patch added

comment:21 by Michal Mládek, 6 months ago

Now the patch is done. I am continue with regression tests...

in reply to:  21 comment:22 by Michal Mládek, 6 months ago

Patch needs improvement: unset

comment:23 by Simon Charette, 6 months ago

Patch needs improvement: set

Left a few comments on the PR, the patch is quite invasive and could be reduced to a subtle changes in how sql.Query.get_aggregation calls clear_ordering as described in comment:15.

comment:24 by Michal Mládek, 5 months ago

Patch needs improvement: unset

comment:25 by Michal Mládek, 5 months ago

I left there 2 PRs because I reached 2 similar solutions, one of them seems more detailed but more power/time consuming and the other is simple and straightforward.

comment:26 by Jacob Walls, 4 weeks ago

Needs tests: set

comment:27 by Jacob Walls, 2 weeks ago

Needs tests: unset
Triage Stage: AcceptedReady for checkin

comment:28 by Jacob Walls <jacobtylerwalls@…>, 2 weeks ago

Resolution: fixed
Status: assignedclosed

In ea3a71c2:

Fixed #26434 -- Removed faulty clearing of ordering field when missing from explicit grouping.

Co-authored-by: Simon Charette <charette.s@…>

comment:29 by Mariusz Felisiak, 2 weeks ago

This caused a regression on Oracle but not only. group_by is a tuple of Col expressions, order_by is a list of strings, so {*self.order_by, *self.extra_order_by} will never be a subset of group_by.

Previously:

SELECT
   ...
FROM "AGGREGATION_REGRESS_BOOK"
WHERE "AGGREGATION_REGRESS_BOOK"."ID" IN (
   SELECT MAX(U0."ID") AS "ID__MAX"
   FROM "AGGREGATION_REGRESS_BOOK" U0
   GROUP BY U0."CONTACT_ID"
)
ORDER BY "AGGREGATION_REGRESS_BOOK"."ID" ASC

with this patch

SELECT
   ...
FROM "AGGREGATION_REGRESS_BOOK"
WHERE "AGGREGATION_REGRESS_BOOK"."ID" IN (
   SELECT MAX(U0."ID") AS "ID__MAX"
   FROM "AGGREGATION_REGRESS_BOOK" U0
   GROUP BY U0."CONTACT_ID"
   ORDER BY U0."CONTACT_ID" ASC
)
ORDER BY "AGGREGATION_REGRESS_BOOK"."ID" ASC

As far as I'm aware, we would need to translate string aliases (order_by and extra_order_by) to Col expressions but this may be expensive here (or at least non-trivial).

comment:30 by Simon Charette, 2 weeks ago

we would need to translate string aliases (order_by and extra_order_by) to Col expressions but this may be expensive here (or at least non-trivial).

It's not only a matter of being expensive and non trivial it simply cannot be done in the way the ORM currently resolves order_by members at the last minute. Trying to eagerly resolve order_by, instead of lazily at compilation time, could break tons of queries as it would change the JOIN reuse logic and produce JOIN left overs when order_by is called to explicitly clear (the ORM doesn't have a way to elide JOINs when they are added see ticket:35865#comment:6).

It might be worth reverting and working on a solution more localized to get_aggregation as it's terminal (it's the method called by aggregate and thus count) so we know the query cannot be altered further from there and thus we could eagerly resolve order_by from there.

comment:31 by Jacob Walls, 2 weeks ago

Very helpful. I'll prepare a reversion.

comment:32 by Jacob Walls, 2 weeks ago

Resolution: fixed
Status: closednew
Triage Stage: Ready for checkinAccepted

comment:33 by Jacob Walls <jacobtylerwalls@…>, 2 weeks ago

Resolution: fixed
Status: newclosed

In 43933a1d:

Reverted "Fixed #26434 -- Removed faulty clearing of ordering field when missing from explicit grouping."

This reverts commit ea3a71c2d09f8281d8a50ed20e40e1fb13db5cd9.

The implementation was flawed, as self.group_by contains Cols, not aliases.

comment:34 by Jacob Walls, 2 weeks ago

Has patch: unset
Resolution: fixed
Status: closednew

comment:35 by Michal Mládek, 2 weeks ago

I'm not sure what is expected from me at this point - there’s no single comment addressing the solution in PR #19524
or the question(s) I raised there. That PR was rejected without a clear explanation what is wrong, even though the issue has now been reopened and is being discussed again.

A previously functional but incorrect implementation was first accepted and then reverted. The proposed fix in PR #19524
, which resolved the issue by performing an early order_by, was dismissed in favor of the flawed implementation from PR #19519
.

Could someone please clarify what exactly was wrong with the approach in #19524, and what I should avoid when preparing a new attempt?

Last edited 2 weeks ago by Michal Mládek (previous) (diff)

comment:36 by Jacob Walls, 2 weeks ago

comment:30 points out what will not work in that PR: we cannot eagerly resolve order_by() aliases while the query is still being built. comment:30 suggests to move the implementation to get_aggregation() where we know the query is terminal (will not be chained to anything else afterward). Thanks for staying on top of it.

comment:37 by Michal Mládek, 2 weeks ago

Ah, I see - so even though my implementation with eagerly resolving order_by() aliases could technically work (since it was operating on a cloned query), it’s not ideal because it also affects other ORM components that rely on clear_ordering() being called in specific contexts.
That makes sense - since the underlying issue is within get_aggregation(), moving the logic there should be straightforward. Thanks for clarifying!

Last edited 2 weeks ago by Michal Mládek (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top