Opened 4 years ago

Closed 4 years ago

Last modified 16 months ago

#23941 closed Bug (fixed)

Aggregating over decimal field regression

Reported by: Josh Smeaton Owned by: Josh Smeaton
Component: Database layer (models, ORM) Version: master
Severity: Release blocker Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

See https://code.djangoproject.com/ticket/14030#comment:54

Aggregating over a decimal field respects the fields decimal_places and max_digits, which causes issues when the sum of the field expands.

Eg: 999.99 + 999.99 = 1999.98

The result now has 6 digits rather than 5, and if the original field was defined with a max_digits=5, then the aggregation is going to fail.

Change History (9)

comment:1 Changed 4 years ago by Josh Smeaton

See https://github.com/django/django/pull/3655

The patch ignores decimal_places/max_digits for annotations, which don't make sense for aggregation. I'm concerned this may have an impact on regular annotations though, so I'd like to discuss the impact.

It should be easy enough to move the decimal convert_value path to the aggregate base class if annotations would be negatively affected.

comment:2 Changed 4 years ago by Josh Smeaton

Has patch: set
Owner: changed from nobody to Josh Smeaton
Patch needs improvement: set
Status: newassigned

comment:3 Changed 4 years ago by Josh Smeaton

As discussed on IRC - the other issue is that when a user declares the output_field to be DecimalField(digits=3, decimals=2), and we don't honour that by throwing an exception, it feels like a lie. We're ignoring a request from the user.

However, if the output_field is derived from the backing field F('the_price'), then the user hasn't explicitly asked for a return type, but it'll be enforced anyway. Also, DecimalField requires the max_digits and decimal_places arguments, and the user can't be expected to know the bounds of the data, in code, especially for the future.

I could see a couple of options.

  1. Ignore all model field "validation" for annotations. The data type and size is dynamic based on the (changing) dataset, and therefore doesn't require strict validation like a column would.
  2. Special case DecimalField(max_digits='?', decimal_places='?') for the purposes of annotations. This sounds really confusing though, and starts to get very complicated.

I'm open to suggestions.

comment:4 Changed 4 years ago by Tim Graham

Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

comment:5 Changed 4 years ago by Carl Meyer

It seems this issue is maybe being presented as broader than it really is? We don't ever do "validation" (in the Django sense) on annotation outputs, we just convert from db value to Python value. We have one special case with DecimalField where that conversion itself can actually raise an exception. The problem is not with validation, the problem is that max_digits is not just a validation, it is a hard limit on the value that the field knows how to represent (due to its use in setting the decimal context precision). There is no parallel problem with e.g. max_length on a CharField (or with any other field with validation constraints), because the max_length (or any other validation constraint) isn't checked anywhere in the conversion from DB value to Python value. So in a way, this is simply a bug in DecimalField (or perhaps in the way an internal-type of DecimalField is special-cased in ExpressionNode.convert_value), in that its "validation" constraints are applied (and result in hard errors) in data conversion, where normally field validation does not apply at all.

(As an aside, it also seems a little odd to me that ExpressionNode.convert_value assumes that any field with internal-type of DecimalField will have a format_number method, when DecimalField.format_number() has a docstring indicating that it only exists for legacy code.)

In the case where no explicit output_field is given for an annotation, we should definitely not apply the max_digits or decimal_places values from the source field to the result; we should simply convert to Decimal and return.

The trickier case is where an explicit output_field was given. I hear Anssi's argument that we should respect the max_digits and decimal_places in that case, otherwise we are "lying" to the user, but I'm not convinced. Why should DecimalField be special? If you specify a CharField as output_field, its max_length will not be respected in converting from db to Python. If you specify a PositiveIntegerField as output_field for an expression, you can get a negative result with no error, etc. Conversion is not the same thing as validation.

It seems to me that the idea of applying field validation to the output of an expression would be a major new feature addition that would need quite a lot of separate discussion to figure out how it would work, or if its even a good idea (there is currently no such thing as getting ValidationError from attempting to do a database query in Django). In the meantime, it's simply a bug (and a regression) that such validation is sort-of applied in the case of DecimalField, and the correct fix is to just stop doing that, as the current pull request does.

This does leave us in the odd situation where specifying output_field sometimes requires that you provide field parameters which are effectively ignored. I think the ideal API for this would be to allow (or require?) output_field to be a field class, not a field instance. But that would require making get_internal_type a classmethod rather than an instance method on Fields. Which perhaps it should be, but that's also a major invasive change for third-party field classes.

comment:6 Changed 4 years ago by Carl Meyer

(I guess on further review, we seem to have made a lot of previously-required arguments to field instantiation optional now, so the final concern there isn't as much of an issue.)

comment:7 Changed 4 years ago by Josh Smeaton

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:8 Changed 4 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 267a1dcd9b7877d97917ba6222fd247da060f9b0:

Fixed #23941 -- Removed implicit decimal formatting from expressions.

comment:9 Changed 16 months ago by Tim Graham <timograham@…>

In ebc4ee33:

Refs #23941 -- Prevented incorrect rounding of DecimalField annotations on SQLite.

Note: See TracTickets for help on using tickets.
Back to Top