Opened 2 months ago

Last modified 3 weeks ago

#35235 assigned Bug

ArrayAgg() doesn't return default when filter contains __in=[].

Reported by: Per Carlsen Owned by: Sharon Woo
Component: contrib.postgres Version: 5.0
Severity: Normal Keywords: ArrayAgg
Cc: Simon Charette, Nick Pope Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

ArrayAgg changed the default return value in Django 5.0 to None instead of []. The documentation recommends adding default=Value([]) to make it return an empty list instead of None. My app requires empty lists to be returned, so I updated the code and noticed that the return type is inconsistent with the documentation when using the filter functionality of ArrayAgg.

Consider the following query. filter_value and default are variables, and given combinations seem to give inconsistent return values.

MyModel.objects.annotate(
    annotated_ids=ArrayAgg(
        "my_other_model_set__id",
        filter=Q(
            my_other_model_set__id__in=filter_value,
        ),
        default=default,
    )
).first().annotated_ids

MyOtherModel has a fk relation to MyModel, which we referenced via my_other_model_set. Consider the following input values for filter_value and default:

Example 1:
filter_value = [-1] # Or any other ID that does not exist
default = Value([])
This returns an empty list ([]), as expected because there are no results for ArrayAgg

Example 2:
filter_value = []
default = Value([])
BUG: This returns the string '{}' instead of an empty list. This also happens if we don't specify a default value. But if we give default=[] (without the Value()), the return value is consistent with the documentation.

Change History (20)

comment:1 by Sharon Woo, 2 months ago

New to django core and attempting to triage. I couldn't replicate this issue with some simple tests in a draft PR: https://github.com/django/django/pull/17890

I'd like to work on this, will set myself as the owner while I attempt a few more things.

Last edited 2 months ago by Sharon Woo (previous) (diff)

comment:2 by Sharon Woo, 2 months ago

Owner: set to Sharon Woo
Status: newassigned

comment:3 by Mariusz Felisiak, 2 months ago

Cc: Simon Charette Nick Pope added
Summary: ArrayAgg inconsistent return value when no resultsArrayAgg() doesn't return default when filter contains __in=[].
Triage Stage: UnreviewedAccepted

I was able to reproduce the issue with a small models:

class MyModel(models.Model):
    pass


class MyOtherModel(models.Model):
    mymodel = models.ForeignKey(MyModel, models.CASCADE)

for filter_value in ([], [-1]):
    for default in ([], Value([])):
        result = MyModel.objects.annotate(
            annotated_ids=ArrayAgg(
                "myothermodel__id",
                filter=Q(
                    myothermodel__id__in=filter_value,
                ),
                default=Value([]),
            )
        ).first().annotated_ids
        print(result, filter_value, default)

results

{} [] []
{} [] Value([])
[] [-1] []
[] [-1] Value([])

comment:4 by David Sanders, 2 months ago

If you change felix' example to add output_field=ArrayField(IntegerField()) to the default, it works 👍 seems related to #35149 where db_default required output_field set? 🤔

comment:5 by David Sanders, 2 months ago

Looks like this patch fixes the issue:

--- a/django/db/models/aggregates.py
+++ b/django/db/models/aggregates.py
@@ -92,7 +92,7 @@ class Aggregate(Func):
             return c
         if hasattr(default, "resolve_expression"):
             default = default.resolve_expression(query, allow_joins, reuse, summarize)
-            if default._output_field_or_none is None:
+            if default._resolve_output_field() is None:
                 default.output_field = c._output_field_or_none
         else:
             default = Value(default, c._output_field_or_none)

The problem is Aggregate.resolve_expression() calls default._output_field_or_none which then caches the None because it's a @cached_property.... which makes setting the output_field essentially a no-op.

Edit: Sharon feel free to use this ^ as a starting point for your patch 👍 Simon Charette will have some feedback on whether this is "correct" or whether there's a larger problem at play here, but I'd wager this is definitely something that needs fixing.

Last edited 2 months ago by David Sanders (previous) (diff)

comment:6 by Sharon Woo, 2 months ago

Thanks for being patient with me. Let me get through everything and come back with an improved PR, I have some time this week.

in reply to:  6 comment:7 by David Sanders, 2 months ago

Replying to Sharon Woo:

Thanks for being patient with me. Let me get through everything and come back with an improved PR, I have some time this week.

No worries, everyone is on Discord (chat) or the forum if you need help with anything 👍 Details here: https://www.djangoproject.com/community/

comment:8 by Sharon Woo, 2 months ago

Sorry for the lack of progress, before I pushed more code into remote I wanted to get a debugger and the postgres tests running locally... and neither has led to much:

  1. I spent some time this morning setting up a simple app (branching off main at 561e16d6a7 and an empty postgres db, if it matters) to reproduce Felix's example before I make any further changes. However, instead of the mythical '{}', I always get None for result instead, no matter what I pass into default. I am going to step through a bit more here.
  1. I also spent some time looking into how to run the postgres tests locally (they are always skipped now using tox / runtests.py) -- better yet if I can step through the tests in a debugger, but haven't had any luck with resources. What I've done is searched in our favourite search engines and Discord, but lots of posts on general postgres/testing to sift through. Anyone?

Edit: I'll also post on Discord and cross reference this so the two threads are linked after dinner.

Last edited 2 months ago by Sharon Woo (previous) (diff)

comment:9 by David Sanders, 2 months ago

I may have misled you there… on GH I mentioned not needing test data but you do actually need at least one row present in the table to get the wrong result. (The reason why my simplified test worked is because there's data in setupTestData())

To explain:

The bug happens when the aggregate filter is testing for a "contradiction" (ie something that's guaranteed to be false), eg:

ArrayAgg(…, filter=Q(whatever__in=[]))

here it's guaranteed that the filter will always be false. Django has a chance to do some optimisations when this occurs.

Normally when you do

Foo.objects.filter(whatever__in=[])

Django will raise an EmptyResultSet and the catching code will skip calling the db altogether so as not to run an unnecessary query & save on time.

However, when it's within an annotation, Django still needs to run the query. The optimisation that occurs in this case is that Django will simplify the expression instead.

If you observe the query (by doing print(queryset.query)) you'll see that the annotation has been optimised to:

SELECT …, COALESCE(NULL, '{}') …

where Django has deliberately replaced the input to COALESCE with NULL.

This reveals the source of the problem: the expression COALESCE(NULL, '{}') is of type string. It _should_ be something along the lines of COALESCE(NULL, '{}'::integer[]) to force the expression to be of type integer array.

Hope that helps? Sorry for the long-winded explanation but it took me yonks to realise what was happening under the hood with Django so thought you could use the primer 👍😊

Edit: Oh and I forgot to mention why you need at least 1 row, even though it doesn't matter what the data is: It's so that we can see the value of the COALESCE() expression. If there are no rows then then the expression never gets evaluated.

Last edited 2 months ago by David Sanders (previous) (diff)

comment:10 by Simon Charette, 2 months ago

Thanks for taking a shot at fixing the issue Sharon and the mentoring David.

Just dropping a message to say that I believe your solution in comment:5 and your assessment about similarity with #35149 is right David. I guess an alternative could be to wrap the default in ExpressionWrapper instead to avoid the .output_field assignment foot gun. Foot gun that we could probably disarm by making BaseExpression.output_field a @property that clears the _output_field_or_none cache on assignment as I wouldn't be surprised other well intended .output_field assignments are not working as expected.

We should definitely address this issue and I wonder if we should also adjust the documentation to denote that the Value wrapping is not necessary anymore (in other words default=[] should just work).

in reply to:  10 comment:11 by David Sanders, 2 months ago

Replying to Simon Charette:

Foot gun that we could probably disarm by making BaseExpression.output_field a @property that clears the _output_field_or_none cache on assignment as I wouldn't be surprised other well intended .output_field assignments are not working as expected.

Would there be any harm in simply changing @cached_property on BaseExpression._output_field_or_none to @property? 🤔 It's body is simply:

    @cached_property
    def _output_field_or_none(self):
        try:
            return self.output_field
        except FieldError:
            if not self._output_field_resolved_to_none:
                raise

and output_field is already @cached_property as you mentioned, so it seems redundant. These 2 methods were both introduced as cached props when originally added by Josh in 2013. I just wonder whether Josh was being extra safe around the FieldError handling.

Last edited 2 months ago by David Sanders (previous) (diff)

comment:12 by Simon Charette, 2 months ago

Would there be any harm in simply changing @cached_property on BaseExpression._output_field_or_none to @property?

I would definitely support that change! The cache invalidation challenges that the current implementation impose don't seem worth the mere caching benefits that caching _output_field_or_none over the already existing output_field caching provide.

Last edited 2 months ago by Simon Charette (previous) (diff)

comment:13 by David Sanders, 2 months ago

❤️ awesome

@Sharon so that seems to be the solution if you'd like to prepare a patch 😊

ps: Many folks on Discord will happily answer your test setup q's if you still had them.

comment:14 by Sharon Woo, 2 months ago

Hi David and Simon, thank you! This is extremely helpful. I've posted on Discord and contributed a cat pic in the pets channel for good measure.

I'm now able to reproduce exactly Felix's result locally, the little test app for reference (again definitely not best practice): https://github.com/sharonwoo/django/tree/test-app-for-pairing, test_app/test_app/testagg.py is where I put the example.

Now that I have this up, to make changes and push to remote with slightly more confidence.

Last edited 2 months ago by Sharon Woo (previous) (diff)

comment:15 by Sharon Woo, 2 months ago

Test app validates

@property
    def _output_field_or_none(self):

fixes the issue. Now to write a test and commit to remote.

comment:16 by David Sanders, 2 months ago

@Mariusz, I noticed you didn't categorise this as a release blocker but just wanted to check whether it should be since next month is approaching - technically it's not a regression from 5.0 but it was "uncovered" by the change in default behaviour 😅

comment:17 by Sharon Woo, 2 months ago

Linking to a related ticket: #35257.

Last edited 2 months ago by Tim Graham (previous) (diff)

comment:18 by Per Carlsen, 3 weeks ago

Hi! Just wondering what the status of this ticket is and when you expect to have a fix ready? :-)

comment:19 by Natalia Bidart, 3 weeks ago

Hi Per, the current status of the ticket is reflected in the ticket history, where all conversations are documented openly.

We greatly rely on the community for contributing both fixes and new features. If you have the time and desire to help, there is an ongoing effort from Sharon Woo in this PR:

https://github.com/django/django/pull/17890

One option would be to reach out to Sharon Woo and confirm whether they have time to continue working in the PR and see if/where you could be of help. If they don't, perhaps pick up where they left of? From a quick read, it seems that the bulk of the work was done, only some extra tests are needed.

Thank you!

comment:20 by Natalia Bidart, 3 weeks ago

Has patch: set
Needs tests: set
Patch needs improvement: set
Note: See TracTickets for help on using tickets.
Back to Top