Code

Opened 4 years ago

Last modified 7 months ago

#14317 assigned Bug

numberformat.format produces wrong results

Reported by: akaariai Owned by: lukenio
Component: Internationalization Version: 1.5
Severity: Normal Keywords: Localization, number formatting
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The problem:

>>> from django.utils.numberformat import format
>>> print format(0.000000001, ',', 2)
1e-10,2

It can be fixed with using '%.2f' % number (the format string can be constructed dynamically), and then replacing the decimal separator and doing the grouping if needed. As a bonus, it is faster that way.

Attachments (3)

ticket14317.diff (4.7 KB) - added by akaariai 4 years ago.
diff_14317_rvarshney.patch (2.9 KB) - added by lukenio 11 months ago.
copy of https://github.com/rvarshney/django/commit/fda8ffc71d69eeee1721e5f595fc48e8abaab84c
ticket14317.2.diff (5.4 KB) - added by apollo13 11 months ago.

Download all attachments as: .zip

Change History (17)

Changed 4 years ago by akaariai

comment:1 Changed 4 years ago by akaariai

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

After this patch, the working of number_format is as follows:

  • if given too large number, will return it in exponent format
  • if given too small a number and no decimal_pos, will return it in exponent format
  • if given decimal_pos, will return the number correctly rounded

Before this patch:

  • if given too large number, would return it in exponent format (possibly mangled with thousand separator), it the number was float, else in correct format.
  • if given too small number and number_pos, would return the number in 1e-10,2, which is a bug
  • if given too small number and no number_pos, would return the number in exponent format (possibly mangled with thousand separator)
  • never did any rounding.

The main disadvantage of this patch is that large int, decimal and string numbers will be returned in exponent format, whereas before they were returned in correct format. Also, small numbers passed as strings when not using number_pos would be correctly returned. Now they are returned in exponent format.

So, all in all, the behavior is more consistent now, as any number passed in, no matter in what type, will be returned in same format. On the other hand, this means loss of correct formatting for some number types. If correct formatting for large numbers passed as strings or decimals is wanted, it gets a bit messy to achieve that and correct decimal_pos handling at the same time. It would at least require different formatting paths for strings and other types.

A simple fix for the bug described in the original report is to test if 'e' is in unicode(number) and if it is, return the e format directly...

Oh, BTW after this patch number formatting should be a bit faster (0% to almost 50% depending on what is being formatted). 1/3 of time is used in testing settings. (It seems a bit strange to me that number_format is testing any settings, but that is another matter.)

comment:2 Changed 4 years ago by akaariai

  • Has patch set
  • Needs documentation set
  • Needs tests set
  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Design decision needed

The proposed patch above does not work correctly. Decimal(10100) is formatted in floating point precision.

Formatting every type passed in in a way that always results in correct formatting is messy, and will perform poorly. For example large float passed in needs to either be casted to decimal, and after that back to string, or the exponent form needs to be transformed directly to string, which is slow and error prone. To make things worse, the formatting will change depending on underlying C library (http://docs.python.org/library/functions.html#float).

So, to solve this ticket, one possibility is just check that if unicode(number) has e in it, return it directly. This way everything else will work, giving right result, except for large floats (will return 1.2e+100), small floats (will return 1.2e-100), or small decimals (again will return 1.2e-100). It is possible to fix the small numbers by using '%.(decimal_pos)f', but this will do rounding. This means that decimals and floats are rounded, while strings are not. Still, it is not wise to use '%.(decimal_pos)f' when handling large numbers (Decimal(10100) or large numbers passed in as strings) will return the number in floating point precision.

As said, it is possible to get almost any case to work if enough special casing is done. But it will require much code and will probably be much slower than the current code, at least for some cases.

I would say the best solution at least for now is to return the number in exponent format if that is what unicode(number) returns. Then document that this is how things work. This is backwards incompatible as the expectation is currently that any number is formatted in non-exponent format. However, this is not how things actually work. Still one possibility is to cast everything as float and if the result has e in it, return it directly. After that continue as now. This way number formatting is most consistent...

Some code to show the problems:

from decimal import Decimal as d
>>> str(10000000000000000000000000000000000000.0)
'1e+37' # Wrong result.
>>> str(10000000000000000000000000000000000000)
'10000000000000000000000000000000000000' # ints work correctly
>>> str(d('10000000000000000000000000000000000000.000000000001'))
'10000000000000000000000000000000000000.000000000001' # as do decimals
>>> '%.2f' % d('10000000000000000000000000000000000000.000000000001')
'9999999999999999538762658202121142272.00' # formatting large decimals using '%.nf' is bad idea.
>>> str(d('0.00000000000000000000000000001'))
'1E-29' # small decimals do not work correctly
>>> '%.2f' % d('0.00000000000000000000000000001')
'0.00' # for small decimals this is a nice way
>>> '%.2f' % d('0.666')
'0.67' # and we get rounding
>>> str(d('0.666'))
'0.666' # but if we want convert this string to 2 decimal_pos format, it will be messy with rounding
>>> float('inf')
'inf' # just a reminder that there are still more possibilities...

Still one more thing: when considering alternatives, it is good to remember that this is very performance sensitive area. After all the standard template rendering benchmark is to render large table of numbers, and for a reason too.

comment:3 Changed 3 years ago by _roman

I can’t duplicate this error. I've used Django vr.1.3 and vr.1.2 to achieve the same error, but in all cases I’ve drawn correct results.

>>> from django.utils.numberformat import format
>>> print format(0.000000001, ',', 2)
1e-10,00

comment:4 Changed 3 years ago by julien

  • Severity set to Normal
  • Type set to Bug

comment:5 Changed 2 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:6 Changed 2 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:7 Changed 2 years ago by akaariai

  • Easy pickings set
  • Triage Stage changed from Design decision needed to Accepted

I am marking this accepted as there is a definite bug:

In [1]: from django.utils.numberformat import format

In [2]: format(0.00000000000099, ',', 2)
Out[2]: u'9,9e'

An idea for solving this ticket without too much complications: large numbers are returned in exponent format if str(number) contains 'e' in it - this is, floats get returned in exponent format, other numbers in decimal format. Small numbers are truncated to the decimal places (rounding happens except for strings). So, you would get something like this

> format('0.666', ',', 2)
OUT: 0,66
> format(0.666, ',', 2)
OUT: 0,67
> format(decimal(1e37), ',', 2)
OUT: 10000000000000000000000000000000000000,00
> format(float(1e37), ',', 2)
OUT: 1e37
> format(float(1e-37), ',', 2)
OUT: 0,00
> format(decimal(1e-37), ',', 2)
OUT: 0,00

I think the above might be straightforward to code and would get rid of the worst inconsistencies. Comments?

If somebody wants a somewhat easy ticket to work on, I am willing to let go of my ownership of this ticket...

comment:8 Changed 23 months ago by anonymous

  • Owner changed from akaariai to anonymous

I will let go of this ticket, I currently have no time to work on this issue.

comment:9 Changed 20 months ago by varshney.ruchi@…

  • Owner changed from anonymous to rvarshney

I have created a pull request for this here. This pull request does not treat '0.666' and 0.666 differently. i.e. they are both formatted as 0.67. Please review and let me know.
Thanks!

https://github.com/django/django/pull/348

comment:10 Changed 11 months ago by lukenio

  • Owner changed from rvarshney to anonymous
  • Status changed from new to assigned

comment:11 Changed 11 months ago by lukenio

  • Version changed from 1.2 to 1.5

comment:12 Changed 11 months ago by lukenio

  • Owner changed from anonymous to lukenio

Changed 11 months ago by apollo13

comment:13 Changed 11 months ago by apollo13

Attached a mergeable version (including the tests) from the PR.

comment:14 Changed 7 months ago by timo

  • Easy pickings unset
  • Needs tests unset

Haven't reviewed the patch in detail, but the tests contain the u'' prefix on some strings which is a syntax error on Python 3.2. It also no longer applies cleanly.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as assigned
The owner will be changed from lukenio to anonymous. Next status will be 'assigned'
The ticket will be disowned. Next status will be 'new'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.