Opened 3 years ago

Closed 2 years ago

#22171 closed Cleanup/optimization (fixed)

sanitize_separators throws away too many thousand separators

Reported by: Klaas van Schelven Owned by: nobody
Component: Internationalization Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


Users in countries that use symbols other than the period (e.g. comma) as a decimal separator, may still occasionally use the period as a decimal separator. (Because they've come to expect that computers want input with a period). If the period coincides with the THOUSANDS_SEPARATOR, and USE_THOUSANDS_SEPARATOR is turned on, this results in the decimal separator basically being thrown out, but still with a validated result.

Example shell output (THOUSANDS_SEPARATOR == '.'; USE_THOUSANDS_SEPARATOR = True)):

>>> from django.utils.translation import activate
>>> activate('nl-nl')
>>> class F(forms.Form):
...   f = forms.DecimalField(max_digits=15, decimal_places=2)
>>> f = F({'f': '10.10'})
>>> print f.is_valid()
>>> print f.cleaned_data['f']

Why does this happen? django/utils/ has a function sanitize_separators, that basically tosses all thousands separators even if they do not actually separate thousands

My solution would be to only toss thousands separators which have at least 3 digits on their right hand side.

Admittedly this is not a full solution: the value 100.000 is still ambiguous if we assume that users may mean either "thousands separator" or "decimal separator" with the period. But at least it does not silently convert 10.00 (which doesn't mean 1000 to anyone) into 1000. This solution also happens to cover all scenarios where cents are involved, which

A better solution would not validate "10.00" at all (since it does not contain either a decimal separator nor a thousands separator separating thousands). However, the current interface of sanitize_operators does not allow for it: it attempts to sanitize towards a string w/ period, and does not allow us to complain if the input is a string w/ period, but we wouldn't want to allow that as a valid input.

The/a solution for Django 1.4 is attached. I imagine it may be a bit more hairy in the up-to-date version, as this supports full unicode for the input, and I'm not entirely sure on the interaction of regex & unicode.

I've reproduced this in 1.4, but can see in the repo that as of today the code still experiences this problem.

Attachments (2)

thousands.diff (817 bytes) - added by Klaas van Schelven 3 years ago.
22171-2.diff (1.8 KB) - added by Claude Paroz 3 years ago.
Alternate patch

Download all attachments as: .zip

Change History (11)

Changed 3 years ago by Klaas van Schelven

Attachment: thousands.diff added

comment:1 Changed 3 years ago by Klaas van Schelven

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

... I cannot edit ...

Finishing the sentence above:

This solution also happens to cover all scenarios where cents are involved, which happens to be an important subclass of dealing with Decimals (Money!)

comment:2 Changed 3 years ago by Claude Paroz

Strange, I cannot reproduce your shell example. However, I can see that 12.210.10 is cleaned to 12.210.10, so no cleaning happens in that case. That last issue could be fixed by:

         if decimal_separator in value:
-            value, decimals = value.split(decimal_separator, 1)
+            value, decimals = value.rsplit(decimal_separator, 1)

Can you check once more?

comment:3 Changed 3 years ago by wim@…

Hi Klaas,

I tried to reproduce the error and failed (Django 1.4.10 and Django 1.4.3). Although I recall having encountered such a weird situation before. Maybe my os was set to Dutch then?

Good idea to raise an error when invalid input has been entered.

comment:4 Changed 3 years ago by Klaas van Schelven

I forgot to include localize=True in my code example. A more complete shell example (that could form the basis for a unit test) is this one:

from django.conf import settings
from django.utils.translation import activate, get_language
from django import forms
from django.utils.formats import sanitize_separators

old_language = get_language()

settings.THOUSAND_SEPARATOR == '.'


class F(forms.Form):
    f = forms.DecimalField(max_digits=15, decimal_places=2, localize=True)

f = F({'f': '10.10'})
f.is_valid() # True

assert f.cleaned_data['f'] != 1010
assert sanitize_separators('10.10') != '1010'


Both assertions fail for me on an out-of-the box installation of Django 1.6

comment:5 Changed 3 years ago by Claude Paroz

Component: UncategorizedInternationalization
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization
Version: 1.4master

Sorry, I misread part of the original summary.
So basically, with DECIMAL_SEPARATOR = ',' and THOUSAND_SEPARATOR = '.', you'd like that 10.10 still return 10.10 because the thousand separator is not in the right position. I see your point, but would like to limit this special casing only when THOUSAND_SEPARATOR == '.', because in some countries grouping is not necessarily by 3 (India, China), but for them the thousand separator is not the dot.

Changed 3 years ago by Claude Paroz

Attachment: 22171-2.diff added

Alternate patch

comment:6 Changed 3 years ago by Claude Paroz

Has patch: set

comment:7 in reply to:  5 Changed 3 years ago by Klaas van Schelven

Replying to claudep:

you'd like that 10.10 still return 10.10 because the thousand separator is not in the right position.

Strictly speaking a validation error would be even more correct, but I'd certainly consider returning 10.10 a major improvement over returning 1010.

comment:8 Changed 2 years ago by Tim Graham

Triage Stage: AcceptedReady for checkin

Claude, code looks fine to me if you think your approach is best. SQLite tests pass on Python 2.7 and 3.2.

comment:9 Changed 2 years ago by Claude Paroz <claude@…>

Resolution: fixed
Status: newclosed

In 132d0e516e5881cb3f6445e9d84e01c9f2123254:

Fixed #22171 -- Improved sanitize_separators cleverness

Thanks Klaas van Schelven for the report and Tim Graham for the

Note: See TracTickets for help on using tickets.
Back to Top