Opened 2 months ago

Last modified 2 months ago

#35852 assigned Bug

intcomma not locale aware when given a decimal as string

Reported by: Jonathan Ströbele Owned by: HEMANT MISHRA
Component: contrib.humanize Version: 5.1
Severity: Normal Keywords:
Cc: Claude Paroz Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The documentation regarding localization says that for thousand separators USE_THOUSAND_SEPARATOR = True or intcomma can be used. The documentation of intcomma says either int/float or string representation works.

This seems to be true for a en locale, but in de locale this doesn't hold up if the input is a string:

en:
9999999999.11   -> 9,999,999,999.11
"9999999999.11" -> 9,999,999,999.11

de:
9999999999.11   -> 9.999.999.999,11
"9999999999.11" -> 9,999,999,999.11

In the case of a decimal value as string in the de-locale the output is wrong, it should be 9.999.999.999,11 (dot (.) as separator).

This seems to be the case because intcomma casts the value to int if its not a float/Decimal (which raises a ValueError for "9999999999.11" and then just puts in commas as thousand separators without any awareness of the current locale.

Also the DocBlock of the intcomma functions says "Convert an integer to a string containing commas every three digits." wich contradicts the int/float (or string thereof) in the documentation.

So either the function is not following the documentation or the documentation is wrong (?).

I think this is problematic as the function is communicated as an alternative/equivalent option to USE_THOUSAND_SEPARATOR=True. When a float value is formatted with {{ value|floatformat:2|intcomma}} the floatformat will return a string and as such, intcomma is no longer locale aware (also reported in #33771). This can lead to outputs like 9,999,999,999,11 in the de locale:

9999999999.11 -> floatformat:2 -> "9999999999,11" -> intcomma -> "9,999,999,999,11"

A solution could be to integrate some sort of float casting into intcomma? Or remove the recommendation of using it in favor of just the USE_THOUSAND_SEPARATOR = True setting and clarifying the documentation of intcomma?

Change History (4)

comment:1 by HEMANT MISHRA, 2 months ago

Owner: set to HEMANT MISHRA
Status: newassigned

comment:2 by Natalia Bidart, 2 months ago

Triage Stage: UnreviewedAccepted

Hello Jonathan, thank you for taking the time to create this report. I have reproduced what you said, by setting the server LANGUAGE_CODE to de. I agree that the result 9,999,999,999.11 feels unexpected for the string "9999999999.11" when intcomma is applied.

Would you like to prepare a patch?

Last edited 2 months ago by Natalia Bidart (previous) (diff)

comment:3 by Jonathan Ströbele, 2 months ago

Hi Natalia,

to be honest I'm not quite sure what the best fix actually would be and I need a bit more guidance here. ;) As I mentioned I think there is a mis-match between the documentation and the actual code, and DocBlock. Also I'm not sure what intcomma actually was intended to mean and how it should be used:

  1. int[eger]comma as in separate a integer into groups of 3 digits separated by a comma (only valid for an en locale). This seems to be the original code and intention of the function when it was added to Django.
  2. int[ernational]comma as in provide a manual way to print a grouped number according to the current locale, if the setting USE_THOUSAND_SEPARATOR is False and so no automatic number grouping is happening.

The 2. option is what I figured from reading the documentation and also think that's the intended function from the current state of the source code. Though it has the aforementioned bug regarding the floats passed as string where it's using a fallback to the old logic that's correct in the en locale.

Since intcomma is passing down the given value to the django.utils.formats.number_format (with enabled force_grouping=True) if it's a float or Decimal already, and only uses the code that is en locale specific code in case of a string, I would suggest to use intcomma as a wrapper of number_format with force_grouping=True. Since the number_format already handles the formatting/grouping of numbers, I think it's no longer needed to reproduce this logic in intcomma?

Making this change seems to conform to nearly all unit tests in tests/humanize_tests/tests.py, except for some unicode stuff introduced in #28628 (fe76944269c13d59f8bb3bc1cfff7ab6443777e4) and a long string without numbers (16a8fe18a3b81250f4fa57e3f93f0599dc4895bc).

This seems to be the case, because number_format doesn't check if the string is containing digits, and just inserts the thousand seperator into the string. Which I think is Okay?

AssertionError: '1,234,567' != '1,234,567' : intcomma test failed, produced '1,234,567', should've produced '1,234,567'
AssertionError: 'th,e q,uic,k b,row,n f,ox ,jum,ped, ov,er ,the, la,zy ,dog' != 'the quick brown fox jumped over the lazy dog' : intcomma test failed, produced 'th,e q,uic,k b,row,n f,ox ,jum,ped, ov,er ,the, la,zy ,dog', should've produced 'the quick brown fox jumped over the lazy dog'

I have created a suggestion for a patch in a fork here: https://github.com/stroebjo/django/commit/f5302d43c45208ebb92516ab2724e86a1e60d71d

comment:4 by Natalia Bidart, 2 months ago

Cc: Claude Paroz added

Claude, would you happen to have an opinion on the above? I can't invest time into this until November as I'm taking some holidays, but I thought you might have a quicker answer.

Note: See TracTickets for help on using tickets.
Back to Top