#34564 closed Bug (invalid)
returning None instead of zero in Count annotation
| Reported by: | Amin Aminian | Owned by: | Amin Aminian |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 4.2 |
| Severity: | Normal | Keywords: | count, orm, annotate |
| Cc: | Simon Charette | Triage Stage: | Accepted |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | yes | Patch needs improvement: | yes |
| Easy pickings: | no | UI/UX: | no |
Description
I am trying to upgrade the Django's version from 3.2 to 4.2 in my project. after installation, some of my tests were broken.
The problem:
I am annotating a queryset with Count function, and the count result in some objects should be zero.
This is my code:
Order.objects.annotate(count=Count("customer", distinct=True))
In Django==3.2, if some orders don't have any customers, the count field would be zero (as I want).
But in Django==4.2, if some orders don't have any customers, the count field would be None, which breaks my code.
I read the source code for both versions. In version 3.2, we have a method called convert_value in Count class in aggregates module. The method is:
class Count(Aggregate):
...
def convert_value(self, value, expression, connection):
return 0 if value is None else value
As we can see, this function is responsible to return 0 instead of None.
But this method is deleted in version 4.2, and the conversion is handling in convert_value property in BaseExpression class in expressions module:
class BaseExpression:
...
@cached_property
def convert_value(self):
field = self.output_field
internal_type = field.get_internal_type()
if internal_type == "FloatField":
return (
lambda value, expression, connection: None
if value is None
else float(value)
)
elif internal_type.endswith("IntegerField"):
return (
lambda value, expression, connection: None
if value is None
else int(value)
)
elif internal_type == "DecimalField":
return (
lambda value, expression, connection: None
if value is None
else Decimal(value)
)
return self._convert_value_noop
In this property we are return a lambda function to be called later, and when we call this lambda, what ever we pass to it, we would get None as the result.
Should it be something like returning 0, if the value is None ?
Change History (22)
comment:1 by , 2 years ago
| Resolution: | → invalid |
|---|---|
| Status: | new → closed |
follow-ups: 3 5 comment:2 by , 2 years ago
| Cc: | added |
|---|
I think this should be considered a release blocker as this is an unintended regression that will affect quite a few users given how common count annotations are over multiple valued relationships.
Unlike ArrayAgg and friends in fee87345967b3d917b618533585076cbfa43451b there was not deprecation period to force the usage of default=0 for Count in 9f3cce172f6913c5ac74272fa5fc07f847b4e112.
A simple solution for a backport would be to adjust Count.__init__ to do extra.setdefault("default", 0) (or define in it's signature for enhanced introspection) so Coalesce is used.
Happy to submit a patch if you agree.
Amin, you can use Count(..., default=0) in the mean time.
comment:3 by , 2 years ago
| Resolution: | invalid |
|---|---|
| Status: | closed → new |
| Triage Stage: | Unreviewed → Accepted |
OK, let's accept this.
Replying to Simon Charette:
I think this should be considered a release blocker as this is an unintended regression that will affect quite a few users given how common count annotations are over multiple valued relationships.
This is a regression in Django 4.0, so it doesn't qualify for a backport anymore.
follow-up: 6 comment:4 by , 2 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
comment:5 by , 2 years ago
Replying to Simon Charette:
I think this should be considered a release blocker as this is an unintended regression that will affect quite a few users given how common count annotations are over multiple valued relationships.
Unlike
ArrayAggand friends in fee87345967b3d917b618533585076cbfa43451b there was not deprecation period to force the usage ofdefault=0forCountin 9f3cce172f6913c5ac74272fa5fc07f847b4e112.
A simple solution for a backport would be to adjust
Count.__init__to doextra.setdefault("default", 0)(or define in it's signature for enhanced introspection) soCoalesceis used.
Happy to submit a patch if you agree.
Amin, you can use
Count(..., default=0)in the mean time.
I have read 9f3cce172f6913c5ac74272fa5fc07f847b4e112. The thing that i'm not understanding is why we want to return None instead of zero. What is the reason ? How this is benefiting us ?
As far as I have read the changes, in Django==4.2, we have an attribute called empty_result_set_value which is equal to NotImplemented in BaseExpression:
class BaseExpression:
empty_result_set_value = NotImplemented
...
And aggregate classes have implemented this. All of classes have empty_result_set_value = None instead of Count class and RegrCount class, which it is empty_result_set_value = 0 in those classes. And because of this attr, we can't use default=0 in Count:
class Aggregate(Func):
...
empty_result_set_value = None
def __init__(
self, *expressions, distinct=False, filter=None, default=None, **extra
):
...
if default is not None and self.empty_result_set_value is not None:
raise TypeError(f"{self.__class__.__name__} does not allow default.")
...
And in Count class, empty_result_set_value=0 and we use default because of that. So as far as I understand, we are considering empty_result_set_value as kind of a default value. So why don't we just return empty_result_set_value in case of being None in convert_value property ?
comment:6 by , 2 years ago
Replying to Mariusz Felisiak:
Can't it be assigned to me ? I really would love to solve it, if we could deal with the solution.
comment:7 by , 2 years ago
| Owner: | changed from to |
|---|
comment:8 by , 2 years ago
Thanks for your consideration Mariusz, I missed that it wasn't a recent regression, time fly by!
If that can help you Amin, the ORM compiler has some logic in place to prevent doing database queries when it knows that it cannot match any rows (e.g. id__in=[]). When this happens (aggregation or when annotating subqueries that don't match any rows) we still need to provide some values to the user that are coherent with what would have happened if the query was still performed, that's when empty_result_set_value kicks in. You might notice in this patch that QuerySet.none() is used, that's a public way of making sure that evaluating the queryset results in no query.
Through a series of patches that tried to make the logic more coherent in this area, and the introduction of Aggregate(..., default=default) which is basically a shorthand for Coalesce(Aggregate(...), default), the long standing empty().aggregate(cnt=Count()) issue where {"cnt": None} was returned was fixed (due to Count.empty_result_set_value = 0) but surprisingly (this is mind blowing to me) we didn't have any tests for the case where .annotate(cnt=Cnt("possibly_empty_multi_value")) so convert_value was wrongly removed assuming that it was covered by the adjusted logic.
As for the patch I could see us going to ways
- Drop
Count.empty_result_set_valueto useextra["default"] = 0soCoalesce(..., 0)is systematically used and we inherit theempty_result_set_valuefrom theCoaesceandValuechain. This might require adjusting some tests that check the exact SQL generated or assert thatCount(default)cannot be used. - Simply add back the
convert_valuethat were removed in 9f3cce172f6913c5ac74272fa5fc07f847b4e112
- seems more desirable to me as it pushes the logic to the database and avoids Python time conversions (
convert_valuecomes at a cost as its run for each row) while 2. is less invasive as it requires no test changes and keeps the generated SQL the same (noCoalescewrapping for each usage ofCount).
Whatever solution we choose we should make sure to add address the same issue we have with RegrCount.
comment:9 by , 2 years ago
Thanks Simon for explaining empty_result_set_value usage.
So, as far as I understood, if the query is going to return None, we return empty_result_set_value instead, without running the query. Right ? If it's right, there is still something that I'm missing.
My point is, assume that we have a query that the compiler does not know that it is going to return None (normal queries like objects.annotate(count=Count("..."))). In this case, the query is going to be run, and the result would be fetched. Now why we can't use empty_result_set_value in case of result is None ?
What I am saying is a code like this:
class BaseExpression:
...
@cached_property
def convert_value(self):
field = self.output_field
internal_type = field.get_internal_type()
...
elif internal_type.endswith("IntegerField"):
return (
lambda value, expression, connection: self.empty_result_set_value # instead of None
if value is None
else int(value)
)
...
return self._convert_value_noop
And about Python time conversions, we are already running this convert_property for each row.
comment:10 by , 2 years ago
So, as far as I understood, if the query is going to return None, we return empty_result_set_value instead, without running the query. Right ? If it's right, there is still something that I'm missing.
That's not exactly how it works, there's a difference between returning None and dealing with an empty result. Some aggregate functions might return None even when not dealing with empty results (e.g. when aggregation over a set of value that contain a NULL). It is not the case for Count but that's the rationale for empty_result_set_value being a distinct feature from default and how convert_value deals with None.
My point is, assume that we have a query that the compiler does not know that it is going to return None (normal queries like objects.annotate(count=Count("..."))). In this case, the query is going to be run, and the result would be fetched. Now why we can't use empty_result_set_value in case of result is None ?
Because empty_result_set_value is used a single very peculiar case which is when an EmptyResultSet is exception is raised at query compilation time. The change you are proposing will work but they blur the line in terms of empty_result_set_value usage. Why should expressions with an IntegerField as an output_field default to empty_result_set_value when other types don't? It also impacts the locality of change since one can no longer see that Count is treated differently that the normal COUNT SQL function solely by looking at its definition.
And about Python time conversions, we are already running this convert_property for each row.
Correct but adding even more logic to it makes it harder to remove eventually which is something option 1. would eventually benefit from. In reality these converters would only need to be run if an explicit output_field is provided otherwise the ORM has the possibility to generate the proper SQL and backend pecularities can be dealt with get_db_converters.
Since the regression is solely about Count, I believe that either 1. or 2. still remain the two best options.
comment:11 by , 2 years ago
Allright then.
I will start working on first solution, as it sounds better for me, if anything sounded like being complicated, I just simply use second solution for now.
Thanks
comment:12 by , 2 years ago
Hi again.
I implemented the first solution, I want to just double check the output ...
I removed empty_result_set_value class attr from both Count and RegrCount, so basically they are inheriting this attr from Aggregate class.
In __init__ method of Count and RegrCount class, I called super()... with default=0, so the default value is 0 for Count. But still, I don't let user pass default value for Count and the default value would be 0 all the time.
So basically Coalesce is appended to Count query with this default value. I fixed all tests that had SELECT COUNT(...) and changed it to SELECT COALESCE(COUNT(...), 0).
Is everything right and as we wanted ?
Thanks for your time Simon!
comment:13 by , 2 years ago
Yep that sounds about right Amin!
Submitting a PR with these changes should validate that everything work as expected.
Thanks for the report, rubber ducking, and working on a patch!
comment:14 by , 2 years ago
| Has patch: | set |
|---|
comment:15 by , 2 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
| Triage Stage: | Accepted → Ready for checkin |
comment:16 by , 2 years ago
| Resolution: | fixed |
|---|---|
| Status: | closed → new |
| Triage Stage: | Ready for checkin → Accepted |
Ticket is not fixed until PR is merged.
comment:17 by , 2 years ago
| Needs tests: | set |
|---|---|
| Patch needs improvement: | set |
| Status: | new → assigned |
comment:18 by , 2 years ago
Hi again.
I was trying to write a regression test that I faced an issue.
Apparently, Count function is returning 0 usually. But I was unlucky enough to face some situations that Count return None.
One situation is during my tests, when using Timescaledb (Postgres extension). In this scenario, I'm getting None instead of 0 when using Count.
And the another situation is on production server, when some complicated queries are executed and I'm getting None there too. I cannot find out why that leads to None, as I get 0 when using similar queries.
So I'm basically asking, do you know some simple scenarios that DB would return None ? (I suppose the old convert_value function was there for those scenarios)
And if your not, is it valid to use Timescaledb in test ? (Actually I have written such a test already for TDD development during this PR, and my changed fixes my issue with that)
follow-up: 20 comment:19 by , 2 years ago
Django doesn't support TimescaleDB so any change against it is not considered a regression from our perspective. You might have run into bugs with TimescaleDB for all we know.
The fact you cannot reproduce against SQLite, Postgres, MySQL, or Oracle might explain why this was not caught by the test suite in the first place which would make more sense given how common such annotations are.
Here are my attempts at reproducing
comment:20 by , 2 years ago
Replying to Simon Charette:
The fact you cannot reproduce against SQLite, Postgres, MySQL, or Oracle might explain why this was not caught by the test suite in the first place which would make more sense given how common such annotations are.
I'm sure about this problem on Timescale, and as it is an extension for Postgres, I think it is a corner case for Postgres (or even other DBs).
I should have tried to do this regression test earlier. It is my bad, sorry about that.
But still, I suggest to simply have old convert_value to prevent corner cases for now and for the future.
comment:21 by , 2 years ago
| Resolution: | → invalid |
|---|---|
| Status: | assigned → closed |
Unfortunately this was not even demonstrated to be a corner case for Postgres, if it was the case we'd certainly accept a patch for it.
We can't keep code around that we can't test for regression against I'm afraid.
If someone is able to write a test demonstrating that Count can return NULL values from modern SQL standard or against any supported database backend that might diverge from the standard (Postgres, SQLite, MySQL, Oracle) we can re-open this ticket but in the mean time this ticket doesn't seem actionable.
comment:22 by , 2 years ago
PG docs confirms that count(<expression>) must always return a result as it counts the number of rows for which <expression> is not null:
count ( "any" ) → bigint
Computes the number of input rows in which the input value is not null.
My advice would be to confirm this isn't documented behaviour with Timescale, if it isn't then report the issue ¯\_(ツ)_/¯.
This was an intended change in 9f3cce172f6913c5ac74272fa5fc07f847b4e112. You can use
Coalesce()to return0instead, e.g.Coalesce(Count("customer", distinct=True)), 0)