Opened 9 years ago

Closed 3 years ago

Last modified 3 years ago

#26430 closed Bug (fixed)

Coalesce in Aggregations ignored when EmptyResultSet returned

Reported by: Ryan Prater Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords: aggregation coalesce in queryset
Cc: Simon Charette, Hannes Ljungberg, Anton Agestam, pope1ni Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using an empty list when using the __in= returns an EmptyResultSet and prevents an Aggregate Coalesce from working properly. See below:

# Test with matched Queryset. Sum will return 50
OrderItem.objects.filter(pk__in=[1]).aggregate(test=Coalesce(Sum('quantity'), Value(0)))
>>>{'test': 50}

# Test with unmatched Queryset. Sum will return 0
OrderItem.objects.filter(pk__in=[-1]).aggregate(test=Coalesce(Sum('quantity'), Value(0)))
>>> {'test':0}

# Test with unmatched Queryset (using empty list). EmptyResultSet returned because of empty list. Sum will return NONE
OrderItem.objects.filter(pk__in=[]).aggregate(test=Coalesce(Sum('quantity'), Value(0)))
>>> {'test': None}

Simon Charette on django-users suggested the following:

From what I understand the ORM simply doesn't perform any query in this case
as the pk__in lookup cannot match any OrderItem and result in an
EmptyResultSet exception[1].

This exception is caught in the Query.get_aggregation() method where all
aggregates are converted to None[2].

I suppose we should alter the except EmptyResultSet clause to account for
outer_query.annotation_select items that are Coalesce() instances used with
Value() but I'm unsure about how it should be done.

[1] https://github.com/django/django/blob/2e0cd26ffb29189add1e0435913fd1490f52b20d/django/db/models/lookups.py#L221-L223
[2] https://github.com/django/django/blob/2e0cd26ffb29189add1e0435913fd1490f52b20d/django/db/models/sql/query.py#L439-L445

See full discussion here:
https://groups.google.com/forum/#!topic/django-users/HGD3Vv3IerA

Change History (21)

comment:1 by Josh Smeaton, 9 years ago

I do agree that this is a bug, but I don't think the correct fix is to detect specific expressions and apply custom logic for each one. Simon mentioned nested expressions which would be very difficult to resolve the correct behaviour for.

For what it's worth, a Count will also show None in an EmptyResultSet even though its converters should coerce None to 0. The reason it doesn't is because the expressions and aggregates aren't actually used. Everything is Noned.

If I was writing this from scratch I'd raise an error on receiving an EmptyResultSet. An empty list in a filter clause seems like broken behaviour. We can't do that though, because it's backwards incompatible, and there's probably valid use cases for this behaviour now.

A possible fix would be to implement something like on_empty_result for Expressions, to let each subclass decide what it should return if it's not executed. It would have to recursively call this method, and then decide which children should be allowed to propagate their values. But honestly, this seems like a lot of work and complexity for very little gain.

Short of documentation changes or a deprecation, I'd be tempted to wontfix this. I'm certainly open to ideas I haven't considered though.

comment:2 by Ryan Prater, 9 years ago

I can agree that detecting specific expressions is overkill and for that matter, not scalable. However, I don't think an "empty list in a filter clause seems like broken behavior". As you mentioned about valid use cases: I'm using this with a SelectMultiple which obviously allows users to select no options. Writing custom validation to prevent an empty query should be the responsibility of the developer, but not enforced at the framework level.

It seems like this could be avoided by modifying the logic of In.rhs (and/or not throwing the EmptyResultSet at all?), but this is my first bug and I'm not versed enough to discern.

comment:3 by Simon Charette, 9 years ago

I agree with Josh that the on_empty_result approach seems like a lot of work for very little gain but I can't see any other way to fix this issue appropriately.

@ryanprater not throwing EmptyResultSet in the In lookup would solve your particular issue but the underlying one would remain, using Coalesce with aggregation can result in an unexpected return value as EmptyResultSet is thrown by other part of Django (queryset.none() relies on it).

comment:4 by Anssi Kääriäinen, 9 years ago

Maybe we could somehow skip throwing EmptyResultSet when doing an .aggregate() call, but throw it when doing normal queries. One idea is to use query.add_context('in_aggregate_query', True), and then check compiler.query.context before throwing the EmptyResultSet.

Another possible idea is to add as_python() methods to pretty much every part of the ORM. The as_python() method would run the expressions as Python code, so you could run whole querysets without actually touching the database. For the case where the query for .aggregate() generates an EmptyResultSet we would run the queryset in Python against one row having all None values, thus producing the wanted answer. This could be extremely convenient for testing, too. OTOH doing this properly will require at least a couple of months of work, so this is for sure overkill for this issue.

comment:5 by Simon Charette, 9 years ago

FWIW I gave a naive try at implementing the get_empty_result() approach suggested by Josh:

https://github.com/django/django/pull/6361

The full suite pass but coalesced results relying on database functions (e.g. NOW()) are not supported.

comment:6 by Anssi Kääriäinen, 9 years ago

Triage Stage: UnreviewedAccepted

See #26433 for related issue and a new idea for solving this.

Marking as accepted, I think we can solve this without unreasonable effort.

comment:7 by Simon Charette, 5 years ago

This happens to be addressed by https://github.com/django/django/pull/12602

Last edited 5 years ago by Claude Paroz (previous) (diff)

in reply to:  7 comment:8 by Nick Pope, 4 years ago

Replying to Simon Charette:

This happens to be addressed by https://github.com/django/django/pull/12602

I was looking into reviving https://github.com/django/django/pull/12602.

It seems that some parts of that were already addressed in c2d4926702045e342a668057f0a758eec9db9436.
Notably this solved the None instead of 0 issue for Count() in aggregation_regress.tests.AggregationTests.test_empty_filter_aggregate.

It doesn't seem to be true that this ticket will be addressed by that PR as implied by this comment.
Changing the test to use Coalesce() as below still fails with total_age being None. There is nothing to determine that the second argument of Coalesce() should be used.

             self.assertEqual(
-                Author.objects.filter(id__in=[]).annotate(Count('friends')).aggregate(Count('pk')),
-                {'pk__count': 0},
+                Author.objects.filter(pk__in=[]).annotate(Count('friends')).aggregate(total_age=Coalesce(Sum('age'), 0)),
+                {'total_age': 0},
             )

In the case of Count() we get 0 because of Count.convert_value() explicitly returning 0 if the value is None.

It seems that https://github.com/django/django/pull/6361 would be the way forward.
Was there a reason you abandoned that approach, Simon? Would you be happy for me to pick that up again?

comment:9 by Simon Charette, 4 years ago

Cc: Simon Charette added

comment:10 by Hannes Ljungberg, 4 years ago

Cc: Hannes Ljungberg added

comment:11 by Anton Agestam, 4 years ago

Cc: Anton Agestam added

comment:12 by Simon Charette, 4 years ago

Sorry for the delayed answer Nick.

It seems that ​https://github.com/django/django/pull/6361 would be the way forward. Was there a reason you abandoned that approach, Simon? Would you be happy for me to pick that up again?

I think it would be the most straightforward way to get there but the main issue with the get_empty_result approach is coalescence to database expressions e.g. Coalesce('col', Now()).

It's tempting to mock our way there and cheat for provided expressions in implementing Python equivalent (e.g. Now.get_empty_result -> timezone.now) but it has its limits (e.g. imagine coalescing to calling a database procedure). In the end EmptyResultSet is meant to be an optimization so I believe we should turn it off in circumstances where it affects the correctness of the returned results.

A simpler approach, inspired by Anssi's comment:4, would be to add a SQLCompiler.supports_empty_result_set = False property and set it to False for SQLAggregateCompiler. Parts of the code that raise this exception could then be adjusted not to do so when the current compiler doesn't support it.

(completely untested)

  • django/db/models/lookups.py

    diff --git a/django/db/models/lookups.py b/django/db/models/lookups.py
    index 8d3648b393..0b02287d2a 100644
    a b def process_rhs(self, compiler, connection):  
    392392            except TypeError:  # Unhashable items in self.rhs
    393393                rhs = [r for r in self.rhs if r is not None]
    394394
    395             if not rhs:
     395            if not rhs and compiler.supports_empty_result_set:
    396396                raise EmptyResultSet
    397397
    398398            # rhs should be an iterable; use batch_process_rhs() to
  • django/db/models/sql/compiler.py

    diff --git a/django/db/models/sql/compiler.py b/django/db/models/sql/compiler.py
    index 7264929da8..6cfcbf7f9f 100644
    a b class SQLCompiler:  
    2525        r'^(.*)\s(?:ASC|DESC).*',
    2626        re.MULTILINE | re.DOTALL,
    2727    )
     28    supports_empty_result_set = True
    2829
    2930    def __init__(self, query, connection, using):
    3031        self.query = query
    def pre_sql_setup(self):  
    16361637
    16371638
    16381639class SQLAggregateCompiler(SQLCompiler):
     1640    # Due to the nature of aggregation coalescence some expressions might
     1641    # need to be interpreted on the database to return the correct value
     1642    # when dealing with an empty result set.
     1643    supports_empty_result_set = False
     1644
    16391645    def as_sql(self):
    16401646        """
    16411647        Create the SQL for this query. Return the SQL string and list of

I guess we could also use an hybrid approach where SQLAggregateCompiler only sets supports_empty_result_set = False if any of outer_query.annotation_select.items() is not a literal value prior to proceeding with the compilation/as_sql phase. That would keep the optimization for most cases but fallback to the database when it cannot be computed on the Python side.

Version 0, edited 4 years ago by Simon Charette (next)

comment:13 by Simon Charette, 4 years ago

Has patch: set

Tried implementing the above in https://github.com/django/django/pull/14430

comment:14 by Mariusz Felisiak, 4 years ago

Cc: pope1ni added

comment:15 by Mariusz Felisiak, 3 years ago

Needs documentation: set
Owner: changed from nobody to Simon Charette
Patch needs improvement: set
Status: newassigned

comment:16 by GitHub <noreply@…>, 3 years ago

In 5371ad1d:

Refs #26430 -- Added tests for PostgreSQL-specific aggregates on EmptyQuerySets and used subTest().

comment:17 by Simon Charette, 3 years ago

Needs documentation: unset
Patch needs improvement: unset

comment:18 by Mariusz Felisiak, 3 years ago

Triage Stage: AcceptedReady for checkin

comment:19 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In f3112fde:

Fixed #26430 -- Fixed coalesced aggregation of empty result sets.

Disable the EmptyResultSet optimization when performing aggregation as
it might interfere with coalescence.

comment:20 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

In 9f3cce17:

Refs #26430 -- Re-introduced empty aggregation optimization.

The introduction of the Expression.empty_aggregate_value interface
allows the compilation stage to enable the EmptyResultSet optimization
if all the aggregates expressions implement it.

This also removes unnecessary RegrCount/Count.convert_value() methods.
Disabling the empty result set aggregation optimization when it wasn't
appropriate prevented None returned for a Count aggregation value.

Thanks Nick Pope for the review.

comment:21 by GitHub <noreply@…>, 3 years ago

In 0f3e1a54:

Refs #26430 -- Removed unused branch in sql.Query.get_count().

Now that sql.Query.get_aggregation() properly deals with empty result
sets summary Count() annotations cannot result in None.

Unused since 9f3cce172f6913c5ac74272fa5fc07f847b4e112.

Note: See TracTickets for help on using tickets.
Back to Top