#21729 closed Cleanup/optimization (worksforme)
DecimalField.to_python() fails on values with invalid unicode start byte
Reported by: | brett_energysavvy | Owned by: | nobody |
---|---|---|---|
Component: | Forms | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Consider the following example:
from django.forms import Form from django.forms.fields import DecimalField class MyForm(Form): field = DecimalField() data = {'field': '1\xac23'} form = MyForm(data) form.is_valid() # This will raise a DjangoUnicodeDecodeError instead of returning False and having a validation error on the 'field'.
I noticed this on Django 1.5, but it looks like it also reproes on Django 1.6.
Upon investigation, it looks like smart_str was used in previous versions. Nowadays, it looks like smart_text can throw a DjangoUnicodeDecodeError in these cases. I believe this exception should be caught in the to_python code and go through the same codepath as when an exception is raised in the "Decimal(value)" code block.
Change History (6)
comment:1 by , 11 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:2 by , 10 years ago
I think that with current Django code, a field should never receive such byte streams.
Not quite sure what this quote is referring to, possibly how requests are processed and handled before going into a view?
The Form class makes no rules about about where the data must come from.
I am hitting this issue when building an API using a Form's CharField
that takes the user's data from a CSV file.
They sent me a a string that ends in a partial UTF-8 character (only the first byte, and not the second), and the form raises DjangoUnicodeDecodeError
on is_valid
.
Pretty much exactly what the original example demonstrates.
I argue that there is a precedence catch this exception (possibly in the Field
class), since the job of a Form
is to take any user input data and produce a list of errors. And when the user did send invalid data, the Form crashed instead of producing an error.
Here is (the relevant part of) a stack trace:
File "/home/my_user/my_project/apps/my_app/parsers.py", line 222, in parse_feed if not form.is_valid(): File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 129, in is_valid return self.is_bound and not bool(self.errors) File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 121, in errors self.full_clean() File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 273, in full_clean self._clean_fields() File "/usr/local/lib/python2.6/dist-packages/django/forms/forms.py", line 288, in _clean_fields value = field.clean(value) File "/usr/local/lib/python2.6/dist-packages/django/forms/fields.py", line 148, in clean value = self.to_python(value) File "/usr/local/lib/python2.6/dist-packages/django/forms/fields.py", line 208, in to_python return smart_text(value) File "/usr/local/lib/python2.6/dist-packages/django/utils/encoding.py", line 73, in smart_text return force_text(s, encoding, strings_only, errors) File "/usr/local/lib/python2.6/dist-packages/django/utils/encoding.py", line 119, in force_text raise DjangoUnicodeDecodeError(s, *e.args) django.utils.encoding.DjangoUnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 29: unexpected end of data. You passed in 'The Chesterfield brand Stor h\xc3' (<type 'str'>)
comment:3 by , 10 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
comment:4 by , 10 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Bug → Cleanup/optimization |
Version: | 1.5 → master |
Thanks for your input. I think that your use case makes sense. Note that CharField
is also subject to this issue, and other fields should be investigated.
comment:5 by , 8 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
Can't reproduce, probably a Python 2 problem which is unsupported on current master, so I'm closing this as wontfix.
Added a test in https://github.com/django/django/pull/8314
comment:6 by , 8 years ago
I think the way to reproduce this is with an input like b'1\xac23'
(bytestring rather than unicode string), however, I'm not sure an input is feasible on Python 3. For example, similar tests have been removed in 75f0070a54925bc8d10b1f5a2b6a50fe3a1f7f50.
I think that with current Django code, a field should never receive such byte streams. Could you provide us with a plausible use case where such invalid data can reach form data?