Opened 11 years ago
Closed 10 years ago
#22171 closed Cleanup/optimization (fixed)
sanitize_separators throws away too many thousand separators
Reported by: | Klaas van Schelven | Owned by: | nobody |
---|---|---|---|
Component: | Internationalization | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Users in countries that use symbols other than the period (e.g. comma) as a decimal separator, may still occasionally use the period as a decimal separator. (Because they've come to expect that computers want input with a period). If the period coincides with the THOUSANDS_SEPARATOR, and USE_THOUSANDS_SEPARATOR is turned on, this results in the decimal separator basically being thrown out, but still with a validated result.
Example shell output (THOUSANDS_SEPARATOR == '.'; USE_THOUSANDS_SEPARATOR = True)):
>>> from django.utils.translation import activate >>> activate('nl-nl') >>> >>> class F(forms.Form): ... f = forms.DecimalField(max_digits=15, decimal_places=2) ... >>> f = F({'f': '10.10'}) >>> print f.is_valid() True >>> print f.cleaned_data['f'] 1010
Why does this happen? django/utils/formats.py has a function sanitize_separators, that basically tosses all thousands separators even if they do not actually separate thousands
My solution would be to only toss thousands separators which have at least 3 digits on their right hand side.
Admittedly this is not a full solution: the value 100.000 is still ambiguous if we assume that users may mean either "thousands separator" or "decimal separator" with the period. But at least it does not silently convert 10.00 (which doesn't mean 1000 to anyone) into 1000. This solution also happens to cover all scenarios where cents are involved, which
A better solution would not validate "10.00" at all (since it does not contain either a decimal separator nor a thousands separator separating thousands). However, the current interface of sanitize_operators does not allow for it: it attempts to sanitize towards a string w/ period, and does not allow us to complain if the input is a string w/ period, but we wouldn't want to allow that as a valid input.
The/a solution for Django 1.4 is attached. I imagine it may be a bit more hairy in the up-to-date version, as this supports full unicode for the input, and I'm not entirely sure on the interaction of regex & unicode.
I've reproduced this in 1.4, but can see in the repo that as of today the code still experiences this problem.
Attachments (2)
Change History (11)
by , 11 years ago
Attachment: | thousands.diff added |
---|
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Strange, I cannot reproduce your shell example. However, I can see that 12.210.10
is cleaned to 12.210.10
, so no cleaning happens in that case. That last issue could be fixed by:
if decimal_separator in value: - value, decimals = value.split(decimal_separator, 1) + value, decimals = value.rsplit(decimal_separator, 1) parts.append(decimals)
Can you check once more?
comment:3 by , 11 years ago
Hi Klaas,
I tried to reproduce the error and failed (Django 1.4.10 and Django 1.4.3). Although I recall having encountered such a weird situation before. Maybe my os was set to Dutch then?
Good idea to raise an error when invalid input has been entered.
comment:4 by , 11 years ago
I forgot to include localize=True in my code example. A more complete shell example (that could form the basis for a unit test) is this one:
from django.conf import settings from django.utils.translation import activate, get_language from django import forms from django.utils.formats import sanitize_separators old_THOUSAND_SEPARATOR = settings.THOUSAND_SEPARATOR old_USE_THOUSAND_SEPARATOR = settings.USE_THOUSAND_SEPARATOR old_language = get_language() settings.THOUSAND_SEPARATOR == '.' settings.USE_THOUSAND_SEPARATOR = True activate('nl-nl') class F(forms.Form): f = forms.DecimalField(max_digits=15, decimal_places=2, localize=True) f = F({'f': '10.10'}) f.is_valid() # True assert f.cleaned_data['f'] != 1010 assert sanitize_separators('10.10') != '1010' settings.THOUSAND_SEPARATOR = old_THOUSAND_SEPARATOR settings.USE_THOUSAND_SEPARATOR = old_USE_THOUSAND_SEPARATOR activate(old_language)
Both assertions fail for me on an out-of-the box installation of Django 1.6
follow-up: 7 comment:5 by , 11 years ago
Component: | Uncategorized → Internationalization |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
Version: | 1.4 → master |
Sorry, I misread part of the original summary.
So basically, with DECIMAL_SEPARATOR = ','
and THOUSAND_SEPARATOR = '.'
, you'd like that 10.10
still return 10.10
because the thousand separator is not in the right position. I see your point, but would like to limit this special casing only when THOUSAND_SEPARATOR == '.'
, because in some countries grouping is not necessarily by 3 (India, China), but for them the thousand separator is not the dot.
comment:6 by , 11 years ago
Has patch: | set |
---|
comment:7 by , 11 years ago
Replying to claudep:
you'd like that
10.10
still return10.10
because the thousand separator is not in the right position.
Strictly speaking a validation error would be even more correct, but I'd certainly consider returning 10.10
a major improvement over returning 1010
.
comment:8 by , 10 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Claude, code looks fine to me if you think your approach is best. SQLite tests pass on Python 2.7 and 3.2.
comment:9 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
... I cannot edit ...
Finishing the sentence above:
This solution also happens to cover all scenarios where cents are involved, which happens to be an important subclass of dealing with Decimals (Money!)