Opened 11 years ago

Closed 8 years ago

Last modified 8 years ago

#21729 closed Cleanup/optimization (worksforme)

DecimalField.to_python() fails on values with invalid unicode start byte

Reported by: brett_energysavvy Owned by: nobody
Component: Forms Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider the following example:

from django.forms import Form
from django.forms.fields import DecimalField

class MyForm(Form):
    field = DecimalField()

data = {'field': '1\xac23'}
form = MyForm(data)
form.is_valid() # This will raise a DjangoUnicodeDecodeError instead of returning False and having a validation error on the 'field'.

I noticed this on Django 1.5, but it looks like it also reproes on Django 1.6.

Upon investigation, it looks like smart_str was used in previous versions. Nowadays, it looks like smart_text can throw a DjangoUnicodeDecodeError in these cases. I believe this exception should be caught in the to_python code and go through the same codepath as when an exception is raised in the "Decimal(value)" code block.

Change History (6)

comment:1 by Claude Paroz, 11 years ago

Resolution: needsinfo
Status: newclosed

I think that with current Django code, a field should never receive such byte streams. Could you provide us with a plausible use case where such invalid data can reach form data?

comment:2 by Mattias Lindvall, 10 years ago

I think that with current Django code, a field should never receive such byte streams.

Not quite sure what this quote is referring to, possibly how requests are processed and handled before going into a view?
The Form class makes no rules about about where the data must come from.

I am hitting this issue when building an API using a Form's CharField that takes the user's data from a CSV file.
They sent me a a string that ends in a partial UTF-8 character (only the first byte, and not the second), and the form raises DjangoUnicodeDecodeError on is_valid.
Pretty much exactly what the original example demonstrates.

I argue that there is a precedence to catch this exception (possibly in the Field class), since the job of a Form is to take any user input data and produce a list of errors. And when the user did send invalid data, the Form crashed instead of producing an error.

Here is (the relevant part of) a stack trace:

  File "/home/my_user/my_project/apps/my_app/parsers.py", line 222, in parse_feed
    if not form.is_valid():
  File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 129, in is_valid
    return self.is_bound and not bool(self.errors)
  File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 121, in errors
    self.full_clean()
  File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 273, in full_clean
    self._clean_fields()
  File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 288, in _clean_fields
    value = field.clean(value)
  File "/usr/local/lib/python2.6/dist-packages/django/forms/fields.py", line 148, in clean
    value = self.to_python(value)
  File "/usr/local/lib/python2.6/dist-packages/django/forms/fields.py", line 208, in to_python
    return smart_text(value)
  File "/usr/local/lib/python2.6/dist-packages/django/utils/encoding.py", line 73, in smart_text
    return force_text(s, encoding, strings_only, errors)
  File "/usr/local/lib/python2.6/dist-packages/django/utils/encoding.py", line 119, in force_text
    raise DjangoUnicodeDecodeError(s, *e.args)
django.utils.encoding.DjangoUnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 29: unexpected end of data. You passed in 'The Chesterfield brand Stor h\xc3' (<type 'str'>)

And here is my current workaround:

class MyForm(forms.Form):
    def _clean_fields(self, *args, **kwargs):
        try:
            return super(FeedRowForm, self)._clean_fields(*args, **kwargs)
        except DjangoUnicodeDecodeError:
            msg = "The data you povided is not encoded properly, please ensure you have valid UTF-8."
            self._errors['__all__'] = self.error_class([msg])

Version 2, edited 10 years ago by Mattias Lindvall (previous) (next) (diff)

comment:3 by Arthur Koziel, 10 years ago

Resolution: needsinfo
Status: closednew

comment:4 by Claude Paroz, 10 years ago

Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization
Version: 1.5master

Thanks for your input. I think that your use case makes sense. Note that CharField is also subject to this issue, and other fields should be investigated.

comment:5 by Raphael Michel, 8 years ago

Resolution: worksforme
Status: newclosed

Can't reproduce, probably a Python 2 problem which is unsupported on current master, so I'm closing this as wontfix.

Added a test in https://github.com/django/django/pull/8314

comment:6 by Tim Graham, 8 years ago

I think the way to reproduce this is with an input like b'1\xac23' (bytestring rather than unicode string), however, I'm not sure an input is feasible on Python 3. For example, similar tests have been removed in 75f0070a54925bc8d10b1f5a2b6a50fe3a1f7f50.

Note: See TracTickets for help on using tickets.
Back to Top