Opened 13 months ago
Closed 6 months ago
#35852 closed Bug (fixed)
intcomma not locale aware when given a decimal as string
| Reported by: | Jonathan Ströbele | Owned by: | Tim McCurrach |
|---|---|---|---|
| Component: | contrib.humanize | Version: | 5.1 |
| Severity: | Normal | Keywords: | |
| Cc: | Claude Paroz | Triage Stage: | Ready for checkin |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
The documentation regarding localization says that for thousand separators USE_THOUSAND_SEPARATOR = True or intcomma can be used. The documentation of intcomma says either int/float or string representation works.
This seems to be true for a en locale, but in de locale this doesn't hold up if the input is a string:
en: 9999999999.11 -> 9,999,999,999.11 "9999999999.11" -> 9,999,999,999.11 de: 9999999999.11 -> 9.999.999.999,11 "9999999999.11" -> 9,999,999,999.11
In the case of a decimal value as string in the de-locale the output is wrong, it should be 9.999.999.999,11 (dot (.) as separator).
This seems to be the case because intcomma casts the value to int if its not a float/Decimal (which raises a ValueError for "9999999999.11" and then just puts in commas as thousand separators without any awareness of the current locale.
Also the DocBlock of the intcomma functions says "Convert an integer to a string containing commas every three digits." wich contradicts the int/float (or string thereof) in the documentation.
So either the function is not following the documentation or the documentation is wrong (?).
I think this is problematic as the function is communicated as an alternative/equivalent option to USE_THOUSAND_SEPARATOR=True. When a float value is formatted with {{ value|floatformat:2|intcomma}} the floatformat will return a string and as such, intcomma is no longer locale aware (also reported in #33771). This can lead to outputs like 9,999,999,999,11 in the de locale:
9999999999.11 -> floatformat:2 -> "9999999999,11" -> intcomma -> "9,999,999,999,11"
A solution could be to integrate some sort of float casting into intcomma? Or remove the recommendation of using it in favor of just the USE_THOUSAND_SEPARATOR = True setting and clarifying the documentation of intcomma?
Change History (8)
comment:1 by , 13 months ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
comment:2 by , 13 months ago
| Triage Stage: | Unreviewed → Accepted |
|---|
comment:3 by , 13 months ago
Hi Natalia,
to be honest I'm not quite sure what the best fix actually would be and I need a bit more guidance here. ;) As I mentioned I think there is a mis-match between the documentation and the actual code, and DocBlock. Also I'm not sure what intcomma actually was intended to mean and how it should be used:
int[eger]commaas in separate a integer into groups of 3 digits separated by a comma (only valid for anenlocale). This seems to be the original code and intention of the function when it was added to Django.int[ernational]commaas in provide a manual way to print a grouped number according to the current locale, if the settingUSE_THOUSAND_SEPARATORisFalseand so no automatic number grouping is happening.
The 2. option is what I figured from reading the documentation and also think that's the intended function from the current state of the source code. Though it has the aforementioned bug regarding the floats passed as string where it's using a fallback to the old logic that's correct in the en locale.
Since intcomma is passing down the given value to the django.utils.formats.number_format (with enabled force_grouping=True) if it's a float or Decimal already, and only uses the code that is en locale specific code in case of a string, I would suggest to use intcomma as a wrapper of number_format with force_grouping=True. Since the number_format already handles the formatting/grouping of numbers, I think it's no longer needed to reproduce this logic in intcomma?
Making this change seems to conform to nearly all unit tests in tests/humanize_tests/tests.py, except for some unicode stuff introduced in #28628 (fe76944269c13d59f8bb3bc1cfff7ab6443777e4) and a long string without numbers (16a8fe18a3b81250f4fa57e3f93f0599dc4895bc).
This seems to be the case, because number_format doesn't check if the string is containing digits, and just inserts the thousand seperator into the string. Which I think is Okay?
AssertionError: '1,234,567' != '1,234,567' : intcomma test failed, produced '1,234,567', should've produced '1,234,567' AssertionError: 'th,e q,uic,k b,row,n f,ox ,jum,ped, ov,er ,the, la,zy ,dog' != 'the quick brown fox jumped over the lazy dog' : intcomma test failed, produced 'th,e q,uic,k b,row,n f,ox ,jum,ped, ov,er ,the, la,zy ,dog', should've produced 'the quick brown fox jumped over the lazy dog'
I have created a suggestion for a patch in a fork here: https://github.com/stroebjo/django/commit/f5302d43c45208ebb92516ab2724e86a1e60d71d
comment:4 by , 13 months ago
| Cc: | added |
|---|
Claude, would you happen to have an opinion on the above? I can't invest time into this until November as I'm taking some holidays, but I thought you might have a quicker answer.
comment:5 by , 7 months ago
| Owner: | changed from to |
|---|
Thanks all for the conversation above, that's really helpful :)
I think the issue is that for non-float/decimals, we attempt conversion to an integer, and when that fails, we fallback to using the logic within intcomma (instead of the logic within number_format - which is locale aware). I think converting to a Decimal instead might solve this issue. I'll investigate this and make a patch if it seems to fix things.
comment:6 by , 7 months ago
| Has patch: | set |
|---|
comment:7 by , 6 months ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
Hello Jonathan, thank you for taking the time to create this report. I have reproduced what you said, by setting the server LANGUAGE_CODE to
de. I agree that the result9,999,999,999.11feels unexpected for the string"9999999999.11"whenintcommais applied.Would you like to prepare a patch?