Opened 16 months ago

Last modified 2 months ago

#21729 new Cleanup/optimization

DecimalField.to_python() fails on values with invalid unicode start byte

Reported by: brett_energysavvy Owned by: nobody
Component: Forms Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider the following example:

from django.forms import Form
from django.forms.fields import DecimalField

class MyForm(Form):
    field = DecimalField()

data = {'field': '1\xac23'}
form = MyForm(data)
form.is_valid() # This will raise a DjangoUnicodeDecodeError instead of returning False and having a validation error on the 'field'.

I noticed this on Django 1.5, but it looks like it also reproes on Django 1.6.

Upon investigation, it looks like smart_str was used in previous versions. Nowadays, it looks like smart_text can throw a DjangoUnicodeDecodeError in these cases. I believe this exception should be caught in the to_python code and go through the same codepath as when an exception is raised in the "Decimal(value)" code block.

Change History (4)

comment:1 Changed 15 months ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to needsinfo
  • Status changed from new to closed

I think that with current Django code, a field should never receive such byte streams. Could you provide us with a plausible use case where such invalid data can reach form data?

comment:2 Changed 2 months ago by thnee

I think that with current Django code, a field should never receive such byte streams.

Not quite sure what this quote is referring to, possibly how requests are processed and handled before going into a view?
The Form class makes no rules about about where the data must come from.

I am hitting this issue when building an API using a Form's CharField that takes the user's data from a CSV file.
They sent me a a string that ends in a partial UTF-8 character (only the first byte, and not the second), and the form raises DjangoUnicodeDecodeError on is_valid.
Pretty much exactly what the original example demonstrates.

I argue that there is a precedence to catch this exception (possibly in the Field class), since the job of a Form is to take any user input data and produce a list of errors. And when the user did send invalid data, the Form crashed instead of producing an error.

Here is (the relevant part of) a stack trace:

  File "/home/my_user/my_project/apps/my_app/parsers.py", line 222, in parse_feed
    if not form.is_valid():
  File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 129, in is_valid
    return self.is_bound and not bool(self.errors)
  File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 121, in errors
    self.full_clean()
  File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 273, in full_clean
    self._clean_fields()
  File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 288, in _clean_fields
    value = field.clean(value)
  File "/usr/local/lib/python2.6/dist-packages/django/forms/fields.py", line 148, in clean
    value = self.to_python(value)
  File "/usr/local/lib/python2.6/dist-packages/django/forms/fields.py", line 208, in to_python
    return smart_text(value)
  File "/usr/local/lib/python2.6/dist-packages/django/utils/encoding.py", line 73, in smart_text
    return force_text(s, encoding, strings_only, errors)
  File "/usr/local/lib/python2.6/dist-packages/django/utils/encoding.py", line 119, in force_text
    raise DjangoUnicodeDecodeError(s, *e.args)
django.utils.encoding.DjangoUnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 29: unexpected end of data. You passed in 'The Chesterfield brand Stor h\xc3' (<type 'str'>)

And here is my current workaround:

class MyForm(forms.Form):
    def _clean_fields(self, *args, **kwargs):
        try:
            return super(MyForm, self)._clean_fields(*args, **kwargs)
        except DjangoUnicodeDecodeError:
            msg = "The data you povided is not encoded properly, please ensure you have valid UTF-8."
            self._errors['__all__'] = self.error_class([msg])

Last edited 2 months ago by thnee (previous) (diff)

comment:3 Changed 2 months ago by arthurk

  • Resolution needsinfo deleted
  • Status changed from closed to new

comment:4 Changed 2 months ago by claudep

  • Triage Stage changed from Unreviewed to Accepted
  • Type changed from Bug to Cleanup/optimization
  • Version changed from 1.5 to master

Thanks for your input. I think that your use case makes sense. Note that CharField is also subject to this issue, and other fields should be investigated.

Note: See TracTickets for help on using tickets.
Back to Top