Opened 2 years ago

Last modified 2 years ago

#24662 assigned Bug

Sum() returns True/False when used with BooleanField & MySQL

Reported by: Chris Kief Owned by: Marco Santamaria
Component: Database layer (models, ORM) Version: 1.8
Severity: Normal Keywords:
Cc: josh.smeaton@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Chris Kief)

When using Sum() on a BooleanField when only 1 record is present, the ORM returns True/False rather than 1/0. I've tested with MySQL and SQLite, only MySQL exhibits the bug.

This is a change in behavior from previous versions of Django where a Decimal would be returned.

Environment:
MySQL - 5.6.24
MySQL-python==1.2.5

Code to reproduce (new project / app):

class Example(models.Model):
    foo = models.BooleanField(default=True)

Simple query to demonstrate the difference:

# add a single row
a = Example()
a.save()

# query
Example.objects.all().aggregate(count=Count('foo'), sum=Sum('foo'))

# results
# notice Django 1.8 + MySQL
Django 1.7.7 + MySQL
{'count': 1, 'sum': Decimal('1')}

Django 1.7.7 + SQLite
{'count': 1, 'sum': 1}

Django 1.8 + MySQL
{'count': 1, 'sum': True}

Django 1.8 + SQLite
{'count': 1, 'sum': 1}

# add a second row
a = Example()
a.save()

# query
Example.objects.all().aggregate(count=Count('foo'), sum=Sum('foo'))

# results
# notice Django 1.8 + MySQL now returns a decimal
Django 1.7.7 + MySQL
{'count': 2, 'sum': Decimal('2')}

Django 1.7.7 + SQLite
{'count': 2, 'sum': 2}

Django 1.8 + MySQL
{'count': 2, 'sum': Decimal('2')}

Django 1.8 + SQLite
{'count': 2, 'sum': 2}

Change History (10)

comment:1 Changed 2 years ago by Chris Kief

Description: modified (diff)

comment:2 Changed 2 years ago by Chris Kief

Type: UncategorizedBug

comment:3 Changed 2 years ago by Marco Santamaria

Owner: changed from nobody to Marco Santamaria
Status: newassigned

comment:4 Changed 2 years ago by Josh Smeaton

Cc: josh.smeaton@… added

I'm not sure what the correct behaviour should be, but I don't think True is correct. I'm surprised the previous behaviour returned a Decimal - that's about the most unlikely type I would think to be returned.

SUM(bool) only works on databases that use numbers to represent booleans (1, 0). You can get exactly the same behaviour by doing COUNT(bool) and that will be properly supported. You could also try .aggregate(Sum('bool_field', output_type=IntegerField())) if you really wanted to keep that behaviour.

I'm tempted to close this as wontfix, because SUM(bool) only ever worked by accident, and as a by-product of internal representation of some backends. The correct way to calculate the result is by using count. Can you convince me that this behaviour should be supported?

comment:5 Changed 2 years ago by Chris Kief

True doesn't seem like the correct behavior to me either, but I'm also surprised that this worked correctly in the past now that I understand what's going on.

My gut says to lean towards consistency in behavior and in that case, it's strange that True is returned when there's a single row in the database, and a Decimal is returned when there is more than one row. In addition, a numeric value is returned in all cases when using SQLite.

Should probably take a look at what Postgres and Oracle return as well.

comment:6 Changed 2 years ago by Chris Kief

Looks like Postgres throws an error:

Django 1.8 + PostgreSQL
ProgrammingError: function sum(boolean) does not exist

Django 1.7 + PostgreSQL
ProgrammingError: function sum(boolean) does not exist

comment:7 Changed 2 years ago by Josh Smeaton

Triage Stage: UnreviewedAccepted

SUM(bool_field) on Oracle will return 1 also, because bools in Oracle are just a bit (1 or 0). Postgres has a specific bool type, and you can't sum a True/False without an implicit conversion to int. This is why I say SUM(bool) is only accidentally supported for a subset of databases. The field type that is returned is based on the backend get_db_converters and the actual values that come back.

Now, something needs to change here. What I would propose to do is:

1) Document that SUM(bool) is not supported in some way. Perhaps by documenting that SUM can only be applied to numeric types.
2) Raise a warning if SUM(bool_field) is used with the suggestion to convert it to COUNT(bool_field)
3) Begin to deprecate the ability to SUM(bool_field) (an extension of 2) above).

comment:8 Changed 2 years ago by Anssi Kääriäinen

Count(bool_field) isn't the same as sum(bool_field), what you need is sum(case when true then 1 else 0 end). I think it is at least worth considering if we should support min, max and sum for boolean fields. Sum(bool_field) would be automatically converted to sum(case when...) expression, min would be min(case when true then 1 else 0 end) and the return value would be then converted to either True of False, likewise for max().

The reason why I think this might be worth supporting is that this is what you get in Python. For example sum([True, False, False, True]) == 2 and min and max work similarly to the above explanation, too.

comment:9 Changed 2 years ago by Josh Smeaton

Ooh you're right. I was incorrectly thinking that COUNT(0) would be 0, but it's 1. I don't mind the solution you've put forward because it's database agnostic and not at all dependent on the underlying storage mechanism. It will require the Sum aggregate to be a little more complicated though. Resolve Expression will need to inspect the output_type, and return a SumBoolean (or MinBoolean.. etc) which will have some small impact on non-boolean SUMs. Is this worth doing when users are able to create their own SumBoolean (or just use Count(Case(when..)) directly)?

comment:10 Changed 2 years ago by Anssi Kääriäinen

I'm not sure if this is actually worth doing. It would make Sum(bool_field) work like sum(bool_values) works in Python, and it wouldn't take that much code to do. On the other hand, there is something to be said for doing the Count explicitly.

We should also solve what to do in 1.8. Should 1.8 work like 1.7? We could also disallow Sum(bool_field), or perhaps we want to push the above idea to 1.8, too?

My initial feeling is that we should try what the solution actually looks like. The actual code portion of the changes might be surprisingly small.

Note: See TracTickets for help on using tickets.
Back to Top