Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#13139 closed (wontfix)

In forms CharField Raise a Validation Error When Encountering a UnicodeDecodeError

Reported by: megaman821 Owned by: nobody
Component: Forms Version: 1.1
Severity: Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

It is much harder than it has to be to handle UnicodeDecodeError on a CharField because the error bubbles to the top instead of being caught and throwing a validation error.

Attachments (1)

patch.diff (670 bytes) - added by megaman821 5 years ago.

Download all attachments as: .zip

Change History (4)

Changed 5 years ago by megaman821

comment:1 Changed 5 years ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to wontfix
  • Status changed from new to closed

It isn't clear to me what use case this error is trying to catch. What you're proposing is that bad unicode data given to a form should be raised as a form error, rather than as an error in the system that put the bad data there in the first place. In the absence of an example or explanation that demonstrates why this is desirable, I'm closing wontfix.

comment:2 follow-up: Changed 5 years ago by megaman821

Consider there is form for a product that contains the fields title and description. A user pastes in a description from Microsoft Word that triggers a UnicodeDecodeError. Currently this bubbles all the way to the top and you get a server 500 error. You can catch the error by putting a try block around form.is_valid or overriding full_clean in your form and putting the try block there. Neither of these solutions sound optimal, especially if the form just raised a validation error and a message was returned to the user telling them where the character is that is causing the problems. Its not to hard to take care of bad character data like this but the fieldname_clean() doesn't get a chance to run because the UnicodeDecodeError is triggered first. Besides pasting from Word it's easy to get bad data from an uploaded file or importing from a database. How do you fix these things?

comment:3 in reply to: ↑ 2 Changed 5 years ago by kmtracey

Replying to megaman821:

Consider there is form for a product that contains the fields title and description. A user pastes in a description from Microsoft Word that triggers a UnicodeDecodeError. Currently this bubbles all the way to the top and you get a server 500 error.... How do you fix these things?

I've never seen anything like this, so I've not had to fix it. Nor have I been able to recreate it just now, specifically trying scenarios like what you describe. A couple of things about what you describe happening confuse me.

First, in the normal course of events, data submitted from the client will have been converted to unicode long before the form clean method gets to it. URL-encoded parameters are decoded to unicode when the request QueryDicts are created. There is similar code in the multipart parser for dealing with forms encoded as mutlipart form data. In all cases where the incoming client data is decoded, 'replace' is specified so that if there is "bad" data being sent what you get is the unicode replacement character U+FFFD in the submitted data, not a server error. So I'm curious how you are getting to the point where your form field clean method is being handed anything other than unicode to start with.

Second, I've not been able to recreate any problem with posting non-ASCII data copy/pasted from a Word (or any other) document. I've tried a few different browsers on Windows, and all of them correctly handle converting the pasted data from the Windows default cp1252 encoding to utf-8 for submission as form data or URL query parameters.

Whatever problem you are encountering is likely better diagnosed on django-users, and for help in diagnosing it you'll need to provide more details on the browsers in use, the nature of the "bad" data being copy/pasted into form fields, specifics of how your code manipulates the client data before the form is cleaned, and a full traceback of the unicode decode error you get.

Note: See TracTickets for help on using tickets.
Back to Top