#13139 closed (wontfix)
In forms CharField Raise a Validation Error When Encountering a UnicodeDecodeError
Reported by: | megaman821 | Owned by: | nobody |
---|---|---|---|
Component: | Forms | Version: | 1.1 |
Severity: | Keywords: | ||
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
It is much harder than it has to be to handle UnicodeDecodeError on a CharField because the error bubbles to the top instead of being caught and throwing a validation error.
Attachments (1)
Change History (4)
by , 15 years ago
Attachment: | patch.diff added |
---|
comment:1 by , 15 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
follow-up: 3 comment:2 by , 15 years ago
Consider there is form for a product that contains the fields title and description. A user pastes in a description from Microsoft Word that triggers a UnicodeDecodeError. Currently this bubbles all the way to the top and you get a server 500 error. You can catch the error by putting a try block around form.is_valid or overriding full_clean in your form and putting the try block there. Neither of these solutions sound optimal, especially if the form just raised a validation error and a message was returned to the user telling them where the character is that is causing the problems. Its not to hard to take care of bad character data like this but the fieldname_clean() doesn't get a chance to run because the UnicodeDecodeError is triggered first. Besides pasting from Word it's easy to get bad data from an uploaded file or importing from a database. How do you fix these things?
comment:3 by , 15 years ago
Replying to megaman821:
Consider there is form for a product that contains the fields title and description. A user pastes in a description from Microsoft Word that triggers a UnicodeDecodeError. Currently this bubbles all the way to the top and you get a server 500 error.... How do you fix these things?
I've never seen anything like this, so I've not had to fix it. Nor have I been able to recreate it just now, specifically trying scenarios like what you describe. A couple of things about what you describe happening confuse me.
First, in the normal course of events, data submitted from the client will have been converted to unicode long before the form clean method gets to it. URL-encoded parameters are decoded to unicode when the request QueryDicts are created. There is similar code in the multipart parser for dealing with forms encoded as mutlipart form data. In all cases where the incoming client data is decoded, 'replace' is specified so that if there is "bad" data being sent what you get is the unicode replacement character U+FFFD in the submitted data, not a server error. So I'm curious how you are getting to the point where your form field clean method is being handed anything other than unicode to start with.
Second, I've not been able to recreate any problem with posting non-ASCII data copy/pasted from a Word (or any other) document. I've tried a few different browsers on Windows, and all of them correctly handle converting the pasted data from the Windows default cp1252 encoding to utf-8 for submission as form data or URL query parameters.
Whatever problem you are encountering is likely better diagnosed on django-users, and for help in diagnosing it you'll need to provide more details on the browsers in use, the nature of the "bad" data being copy/pasted into form fields, specifics of how your code manipulates the client data before the form is cleaned, and a full traceback of the unicode decode error you get.
It isn't clear to me what use case this error is trying to catch. What you're proposing is that bad unicode data given to a form should be raised as a form error, rather than as an error in the system that put the bad data there in the first place. In the absence of an example or explanation that demonstrates why this is desirable, I'm closing wontfix.