#34808 closed Cleanup/optimization (fixed)
Some aggregation functions may return None; this isn't well documented
Reported by: | Eric Baumgartner | Owned by: | Lufafa Joshua |
---|---|---|---|
Component: | Documentation | Version: | 4.2 |
Severity: | Normal | Keywords: | |
Cc: | Nick Pope | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Calls like Book.objects.filter(...).aggregate(Sum("pages")) may return None if the queryset is empty.
This isn't well documented and I think many of the examples in the documentation can lead to folks writing unsafe code that works most of the time, but can fail for empty queries.
For example (using models from https://docs.djangoproject.com/en/4.2/topics/db/aggregation/):
obscure_publisher = ... q = Book.objects.filter(publisher=obscure_publisher).Aggregate(Sum("pages")) total_pages = q["pages"] # None if the publisher hasn't published any books! # Silly example, but will raise a TypeError kilopages = total_pages / 1000
From what I've seen online, I think the safer approach is to use Coalesce.
obscure_publisher = ... q = Book.objects.filter(publisher=obscure_publisher).Aggregate(Coalesce(Sum("pages"), 0) total_pages = q["pages"] # Safe; using coalesce guarantees an int.
The only reference I've found to using Coalesce in this way appears to be a usage example at https://docs.djangoproject.com/en/4.2/ref/models/database-functions/#coalesce. Coalesce is not mentioned at all on the main aggregation page.
I think the documentation could be improved in a few ways.
On https://docs.djangoproject.com/en/4.2/topics/db/aggregation/:
- The code examples could consider adding Coalesce, or at least adding a comment at the top of the cheat sheet section mentioning that the examples do not include error checking and that using Coalesce is best practice.
For aggregation functions mentioned in https://docs.djangoproject.com/en/4.2/ref/models/querysets/:
- Add something to the documentation of the return type for functions used by aggregate() that mentions the empty queryset case if the result type differs for empty queries.
- For example, Count() safely returns an int even with empty querysets, but Sum, Avg, Min, and Max do not.
- Might also be worth mentioning that one can use Coalesce to avoid having to deal with None values.
Change History (20)
comment:1 by , 16 months ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 16 months ago
Ah, thanks, I completely missed that default
was an option. (In 4.0, at least -- I"m still working with some 3.2 projects.)
I'm happy to take a shot at some documentation changes. I haven't contributed to Django before, so if you have any pointers for newbies I'd appreciate it.
Now that I've reviewed #10929 (and I know what to look for) I think coverage is better than I'd originally thought. I think the main things I'd address include:
- Reviewing all example code across pages that uses
.aggregate(...)
and adding an explicitdefault
param orNone
handling when appropriate. I think this is important because many people grab the sample code and don't read additional documentation. - On the aggregation topic page specifically, add a new section ("Handling empty querysets"?) that discusses
default
and explains that it uses Coalesce under the hood. - Add something to the documentation of
Aggregation.default
on the querysets page mentioning that it uses Coalesce under the hood.
Does that scope make sense?
I think mentioning Coalesce is helpful because it provides a hint for pre-4 users who can't use default
. Is that sort of thing OK? I don't know what the project's position is on mentioning stuff that only applies to older versions.
Somewhat related, should the documentation for default
have a "New in 4.0" flag? Or are those flags reserved for just changed in the last release or two?
comment:3 by , 16 months ago
Hi Eric,
Thanks for putting your hand up to take a shot! Writing docs isn't too hard, basically need to check out the repo and update the relevant file under docs/. It's all in Sphinx and each file corresponds to a page in the docs (the file path corresponds to the URL path, conveniently enough). You then run make html
and point your browser to the generated html.
See https://docs.djangoproject.com/en/dev/internals/contributing/writing-documentation/ Carlton also mentioned that there was some effort to try aligning the docs to: https://diataxis.fr/
Once your happy with how it looks it's simply a matter of creating a pull request and referencing this ticket. (see commit log for message format)
Re your points:
- It's not just empty querysets that produce
None
. Empty groups in a non-empty queryset will do the same thing. - A section on handling
None
usingdefault
is a good idea 👍 Explaining why it returnsNone
by default is also a good idea. - I'm not sure whether it's worth going through every example to explicitly handle defaults, the dedicated section may be enough, Nessita & Felix may make a call on this.
Good luck!
comment:4 by , 16 months ago
Hey Simon, i would love to help with this ticket. some guidance along the way will be helpful. Thanks
follow-up: 6 comment:5 by , 16 months ago
Hi Lufafa,
It'd be best to check that Eric hasn't already started… (please read the comments above).
comment:6 by , 16 months ago
Replying to David Sanders:
Hi Lufafa,
It'd be best to check that Eric hasn't already started… (please read the comments above).
I haven't started on this, Lufafa you're welcome to dig in if you want. Otherwise I'll try to take a pass on this over the weekend.
Based on David's feedback I'm backing off the idea of adding default params to every use of .aggregate() site-wide and just focusing on the examples on the aggregation topic page.
On the querysets page, I would mention that Aggregate.defaults uses Coalesce. And I'm also thinking that the "Return type" documentation for each individual function (Sum, Avg, etc) needs to mention the case where it may return None. While there is a note about empty querysets directly under the Aggregation functions header, I think the way documentation is used is that a lot of people scan the list of functions in the sidebar and click the one they're interested in. This scrolls them directly to the function documentation (e.g. for Sum) and they'll never see the empty queryset note.
comment:7 by , 16 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:9 by , 16 months ago
Patch needs improvement: | set |
---|
comment:10 by , 16 months ago
Eric, there's a PR up if you'd like to provide some feedback – since you had a few points.
comment:12 by , 15 months ago
Hey guys, would it be necessary to add a possible return of None on every aggregate function in https://docs.djangoproject.com/en/4.2/ref/models/querysets/. There is a notice under aggregation functions on the querysets page that already does so, perhaps adding a link on the querysets page about the default argument usage in https://docs.djangoproject.com/en/4.2/topics/db/aggregation/ would be helpful.
comment:13 by , 15 months ago
Hey guys, would it be necessary to add a possible return of None on every aggregate function
I'm curious what others think, but I'm pretty firm that the documentation for specific aggregation functions like Sum should account for None as a possible return value.
I know this may seem redundant given that default is already documented under Aggregation functions generally.
However, the way some people use the documentation means that they won't see the general aggregation documentation, because that's not what they're looking for.
Example (which is pretty close to my experience, which is what led me to file this issue):
- I want to calculate the total of some field across the records in a queryset.
- I'm pretty sure this will be called Sum, so I hit the documentation to look up that function.
- On the querysets page, I scan down the Contents sidebar on the right.
- I find Sum in the sidebar. Great! I click it.
I'm now scrolled directly to the definition of the Sum function, which says:
Return type: same as input field, or output_field if supplied
Why would I not take that at face value? There's nothing in the definition of Sum that would lead me to scroll up the page to see the general documentation for aggregation functions.
If I'm calling Sum for an IntegerField, the return type is int. I interpret that line as equivalent to Sum(...) -> int
. When in reality it is Sum(...) -> Optional[int]
.
I think a relatively simple fix would be to change this to:
Return type: same as input field, or output_field if supplied. Returns None for empty querysets.
I would suggest adding this for all aggregation functions except Count
.
I guess we should augment the aggregation topic to more clearly highlight the usage of the
Aggregate.default
option added in 4.0 (see #10929) which was meant to be address this exact issue by providing a shorthand that avoids wrapping every aggregate inCoalesce
(Aggregate(default=42)
->Coalesce(Aggregate(), 42)
.Would you be interested in providing documentation adjustments Eric?