Opened 13 months ago

Closed 11 months ago

Last modified 11 months ago

#34808 closed Cleanup/optimization (fixed)

Some aggregation functions may return None; this isn't well documented

Reported by: Eric Baumgartner Owned by: Lufafa Joshua
Component: Documentation Version: 4.2
Severity: Normal Keywords:
Cc: Nick Pope Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Calls like Book.objects.filter(...).aggregate(Sum("pages")) may return None if the queryset is empty.

This isn't well documented and I think many of the examples in the documentation can lead to folks writing unsafe code that works most of the time, but can fail for empty queries.

For example (using models from https://docs.djangoproject.com/en/4.2/topics/db/aggregation/):

obscure_publisher = ...
q = Book.objects.filter(publisher=obscure_publisher).Aggregate(Sum("pages"))
total_pages = q["pages"]  # None if the publisher hasn't published any books!

# Silly example, but will raise a TypeError
kilopages = total_pages / 1000

From what I've seen online, I think the safer approach is to use Coalesce.

obscure_publisher = ...
q = Book.objects.filter(publisher=obscure_publisher).Aggregate(Coalesce(Sum("pages"), 0)
total_pages = q["pages"]  # Safe; using coalesce guarantees an int.

The only reference I've found to using Coalesce in this way appears to be a usage example at https://docs.djangoproject.com/en/4.2/ref/models/database-functions/#coalesce. Coalesce is not mentioned at all on the main aggregation page.

I think the documentation could be improved in a few ways.

On https://docs.djangoproject.com/en/4.2/topics/db/aggregation/:

  • The code examples could consider adding Coalesce, or at least adding a comment at the top of the cheat sheet section mentioning that the examples do not include error checking and that using Coalesce is best practice.

For aggregation functions mentioned in https://docs.djangoproject.com/en/4.2/ref/models/querysets/:

  • Add something to the documentation of the return type for functions used by aggregate() that mentions the empty queryset case if the result type differs for empty queries.
  • For example, Count() safely returns an int even with empty querysets, but Sum, Avg, Min, and Max do not.
  • Might also be worth mentioning that one can use Coalesce to avoid having to deal with None values.

Change History (20)

comment:1 by Simon Charette, 13 months ago

Cc: Nick Pope added
Triage Stage: UnreviewedAccepted

I guess we should augment the aggregation topic to more clearly highlight the usage of the Aggregate.default option added in 4.0 (see #10929) which was meant to be address this exact issue by providing a shorthand that avoids wrapping every aggregate in Coalesce (Aggregate(default=42) -> Coalesce(Aggregate(), 42).

Would you be interested in providing documentation adjustments Eric?

comment:2 by Eric Baumgartner, 13 months ago

Ah, thanks, I completely missed that default was an option. (In 4.0, at least -- I"m still working with some 3.2 projects.)

I'm happy to take a shot at some documentation changes. I haven't contributed to Django before, so if you have any pointers for newbies I'd appreciate it.

Now that I've reviewed #10929 (and I know what to look for) I think coverage is better than I'd originally thought. I think the main things I'd address include:

  • Reviewing all example code across pages that uses .aggregate(...) and adding an explicit default param or None handling when appropriate. I think this is important because many people grab the sample code and don't read additional documentation.
  • On the aggregation topic page specifically, add a new section ("Handling empty querysets"?) that discusses default and explains that it uses Coalesce under the hood.
  • Add something to the documentation of Aggregation.default on the querysets page mentioning that it uses Coalesce under the hood.

Does that scope make sense?

I think mentioning Coalesce is helpful because it provides a hint for pre-4 users who can't use default. Is that sort of thing OK? I don't know what the project's position is on mentioning stuff that only applies to older versions.

Somewhat related, should the documentation for default have a "New in 4.0" flag? Or are those flags reserved for just changed in the last release or two?

comment:3 by David Sanders, 13 months ago

Hi Eric,

Thanks for putting your hand up to take a shot! Writing docs isn't too hard, basically need to check out the repo and update the relevant file under docs/. It's all in Sphinx and each file corresponds to a page in the docs (the file path corresponds to the URL path, conveniently enough). You then run make html and point your browser to the generated html.

See https://docs.djangoproject.com/en/dev/internals/contributing/writing-documentation/ Carlton also mentioned that there was some effort to try aligning the docs to: https://diataxis.fr/

Once your happy with how it looks it's simply a matter of creating a pull request and referencing this ticket. (see commit log for message format)

Re your points:

  • It's not just empty querysets that produce None. Empty groups in a non-empty queryset will do the same thing.
  • A section on handling None using default is a good idea 👍 Explaining why it returns None by default is also a good idea.
  • I'm not sure whether it's worth going through every example to explicitly handle defaults, the dedicated section may be enough, Nessita & Felix may make a call on this.

Good luck!

comment:4 by Lufafa Joshua, 13 months ago

Hey Simon, i would love to help with this ticket. some guidance along the way will be helpful. Thanks

comment:5 by David Sanders, 13 months ago

Hi Lufafa,

It'd be best to check that Eric hasn't already started… (please read the comments above).

in reply to:  5 comment:6 by Eric Baumgartner, 13 months ago

Replying to David Sanders:

Hi Lufafa,

It'd be best to check that Eric hasn't already started… (please read the comments above).

I haven't started on this, Lufafa you're welcome to dig in if you want. Otherwise I'll try to take a pass on this over the weekend.

Based on David's feedback I'm backing off the idea of adding default params to every use of .aggregate() site-wide and just focusing on the examples on the aggregation topic page.

On the querysets page, I would mention that Aggregate.defaults uses Coalesce. And I'm also thinking that the "Return type" documentation for each individual function (Sum, Avg, etc) needs to mention the case where it may return None. While there is a note about empty querysets directly under the Aggregation functions header, I think the way documentation is used is that a lot of people scan the list of functions in the sidebar and click the one they're interested in. This scrolls them directly to the function documentation (e.g. for Sum) and they'll never see the empty queryset note.

comment:7 by Lufafa Joshua, 13 months ago

Owner: changed from nobody to Lufafa Joshua
Status: newassigned

comment:9 by Mariusz Felisiak, 12 months ago

Patch needs improvement: set

comment:10 by David Sanders, 12 months ago

Eric, there's a PR up if you'd like to provide some feedback – since you had a few points.

comment:12 by Lufafa Joshua, 12 months ago

Hey guys, would it be necessary to add a possible return of None on every aggregate function in https://docs.djangoproject.com/en/4.2/ref/models/querysets/. There is a notice under aggregation functions on the querysets page that already does so, perhaps adding a link on the querysets page about the default argument usage in https://docs.djangoproject.com/en/4.2/topics/db/aggregation/ would be helpful.

comment:13 by Eric Baumgartner, 12 months ago

Hey guys, would it be necessary to add a possible return of None on every aggregate function

I'm curious what others think, but I'm pretty firm that the documentation for specific aggregation functions like Sum should account for None as a possible return value.

I know this may seem redundant given that default is already documented under Aggregation functions generally.

However, the way some people use the documentation means that they won't see the general aggregation documentation, because that's not what they're looking for.

Example (which is pretty close to my experience, which is what led me to file this issue):

  • I want to calculate the total of some field across the records in a queryset.
  • I'm pretty sure this will be called Sum, so I hit the documentation to look up that function.
  • On the querysets page, I scan down the Contents sidebar on the right.
  • I find Sum in the sidebar. Great! I click it.

I'm now scrolled directly to the definition of the Sum function, which says:

Return type: same as input field, or output_field if supplied

Why would I not take that at face value? There's nothing in the definition of Sum that would lead me to scroll up the page to see the general documentation for aggregation functions.

If I'm calling Sum for an IntegerField, the return type is int. I interpret that line as equivalent to Sum(...) -> int. When in reality it is Sum(...) -> Optional[int].

I think a relatively simple fix would be to change this to:

Return type: same as input field, or output_field if supplied. Returns None for empty querysets.

I would suggest adding this for all aggregation functions except Count.

comment:14 by Natalia <124304+nessita@…>, 12 months ago

In 78b5c907:

Refs #34808 -- Doc'd that aggregation functions on empty groups can return None.

comment:15 by Natalia <124304+nessita@…>, 12 months ago

In fb5dd118:

[5.0.x] Refs #34808 -- Doc'd that aggregation functions on empty groups can return None.

Backport of 78b5c9075348aa12da2e024f6ece29d1d652dfdd from main

comment:16 by Natalia <124304+nessita@…>, 12 months ago

In b08f53f:

[4.2.x] Refs #34808 -- Doc'd that aggregation functions on empty groups can return None.

Backport of 78b5c9075348aa12da2e024f6ece29d1d652dfdd from main

comment:18 by Natalia <124304+nessita@…>, 11 months ago

Resolution: fixed
Status: assignedclosed

In 8adc7c86:

Fixed #34808 -- Doc'd aggregate function's default argument.

comment:19 by Natalia <124304+nessita@…>, 11 months ago

In d4bbdf53:

[5.0.x] Fixed #34808 -- Doc'd aggregate function's default argument.

Backport of 8adc7c86ab85ed91e512bc49056e301cbe1715d0 from main

comment:20 by Natalia <124304+nessita@…>, 11 months ago

In e8fe48d:

[4.2.x] Fixed #34808 -- Doc'd aggregate function's default argument.

Backport of 8adc7c86ab85ed91e512bc49056e301cbe1715d0 from main

Note: See TracTickets for help on using tickets.
Back to Top