Opened 20 months ago

Last modified 20 months ago

#26430 new Bug

Coalesce in Aggregations ignored when EmptyResultSet returned

Reported by: Ryan Prater Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords: aggregation coalesce in queryset
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using an empty list when using the __in= returns an EmptyResultSet and prevents an Aggregate Coalesce from working properly. See below:

# Test with matched Queryset. Sum will return 50
OrderItem.objects.filter(pk__in=[1]).aggregate(test=Coalesce(Sum('quantity'), Value(0)))
>>>{'test': 50}

# Test with unmatched Queryset. Sum will return 0
OrderItem.objects.filter(pk__in=[-1]).aggregate(test=Coalesce(Sum('quantity'), Value(0)))
>>> {'test':0}

# Test with unmatched Queryset (using empty list). EmptyResultSet returned because of empty list. Sum will return NONE
OrderItem.objects.filter(pk__in=[]).aggregate(test=Coalesce(Sum('quantity'), Value(0)))
>>> {'test': None}

Simon Charette on django-users suggested the following:

From what I understand the ORM simply doesn't perform any query in this case
as the pk__in lookup cannot match any OrderItem and result in an
EmptyResultSet exception[1].

This exception is caught in the Query.get_aggregation() method where all
aggregates are converted to None[2].

I suppose we should alter the except EmptyResultSet clause to account for
outer_query.annotation_select items that are Coalesce() instances used with
Value() but I'm unsure about how it should be done.

[1] https://github.com/django/django/blob/2e0cd26ffb29189add1e0435913fd1490f52b20d/django/db/models/lookups.py#L221-L223
[2] https://github.com/django/django/blob/2e0cd26ffb29189add1e0435913fd1490f52b20d/django/db/models/sql/query.py#L439-L445

See full discussion here:
https://groups.google.com/forum/#!topic/django-users/HGD3Vv3IerA

Change History (6)

comment:1 Changed 20 months ago by Josh Smeaton

I do agree that this is a bug, but I don't think the correct fix is to detect specific expressions and apply custom logic for each one. Simon mentioned nested expressions which would be very difficult to resolve the correct behaviour for.

For what it's worth, a Count will also show None in an EmptyResultSet even though its converters should coerce None to 0. The reason it doesn't is because the expressions and aggregates aren't actually used. Everything is Noned.

If I was writing this from scratch I'd raise an error on receiving an EmptyResultSet. An empty list in a filter clause seems like broken behaviour. We can't do that though, because it's backwards incompatible, and there's probably valid use cases for this behaviour now.

A possible fix would be to implement something like on_empty_result for Expressions, to let each subclass decide what it should return if it's not executed. It would have to recursively call this method, and then decide which children should be allowed to propagate their values. But honestly, this seems like a lot of work and complexity for very little gain.

Short of documentation changes or a deprecation, I'd be tempted to wontfix this. I'm certainly open to ideas I haven't considered though.

comment:2 Changed 20 months ago by Ryan Prater

I can agree that detecting specific expressions is overkill and for that matter, not scalable. However, I don't think an "empty list in a filter clause seems like broken behavior". As you mentioned about valid use cases: I'm using this with a SelectMultiple which obviously allows users to select no options. Writing custom validation to prevent an empty query should be the responsibility of the developer, but not enforced at the framework level.

It seems like this could be avoided by modifying the logic of In.rhs (and/or not throwing the EmptyResultSet at all?), but this is my first bug and I'm not versed enough to discern.

comment:3 Changed 20 months ago by Simon Charette

I agree with Josh that the on_empty_result approach seems like a lot of work for very little gain but I can't see any other way to fix this issue appropriately.

@ryanprater not throwing EmptyResultSet in the In lookup would solve your particular issue but the underlying one would remain, using Coalesce with aggregation can result in an unexpected return value as EmptyResultSet is thrown by other part of Django (queryset.none() relies on it).

comment:4 Changed 20 months ago by Anssi Kääriäinen

Maybe we could somehow skip throwing EmptyResultSet when doing an .aggregate() call, but throw it when doing normal queries. One idea is to use query.add_context('in_aggregate_query', True), and then check compiler.query.context before throwing the EmptyResultSet.

Another possible idea is to add as_python() methods to pretty much every part of the ORM. The as_python() method would run the expressions as Python code, so you could run whole querysets without actually touching the database. For the case where the query for .aggregate() generates an EmptyResultSet we would run the queryset in Python against one row having all None values, thus producing the wanted answer. This could be extremely convenient for testing, too. OTOH doing this properly will require at least a couple of months of work, so this is for sure overkill for this issue.

comment:5 Changed 20 months ago by Simon Charette

FWIW I gave a naive try at implementing the get_empty_result() approach suggested by Josh:

https://github.com/django/django/pull/6361

The full suite pass but coalesced results relying on database functions (e.g. NOW()) are not supported.

comment:6 Changed 20 months ago by Anssi Kääriäinen

Triage Stage: UnreviewedAccepted

See #26433 for related issue and a new idea for solving this.

Marking as accepted, I think we can solve this without unreasonable effort.

Note: See TracTickets for help on using tickets.
Back to Top