#440 closed defect (fixed)
[patch] maxlength incorrectly checked for international characters in utf-8
Reported by: | Owned by: | Adrian Holovaty | |
---|---|---|---|
Component: | contrib.admin | Version: | |
Severity: | normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In django.core.formfields TextField.isValidLength checks length very naively:
if data and self.maxlength and len(data) > self.maxlength:
Since sane applications tend to use utf-8 for web output 'data' passed to this method is also in utf-8. This means that len(data) actually gives more bytes than characters for any non-ascii input.
Attachments (3)
Change History (14)
comment:1 by , 19 years ago
comment:2 by , 19 years ago
Hm... any pointer where to get the charset for output? Or Django is always in utf-8?
Also, I'm not (yet :-) ) familiar with the code so I don't know where else similar things can bite... I'd rather just unicode() any string data from web before processing but I have no idea if it would be feasible.
comment:3 by , 19 years ago
I think a len(data.decode('utf-8')) should give the correct string length of the input - django should allways use utf-8 coded output and so receive utf-8 coded input.
if data and self.maxlength and len(data.decode('utf-8')) > self.maxlength:
comment:4 by , 19 years ago
Yeah, I searched the code and it seems that 'utf-8' is hardcoded everwhere. Hence the patch as Hugo suggested.
comment:5 by , 19 years ago
Summary: | maxlength incorrectly checked for international characters in utf-8 → [patch] maxlength incorrectly checked for international characters in utf-8 |
---|
comment:6 by , 19 years ago
This hardcoded 'utf-8' still bugs me. Though I understand that it's the best practice but the user may have legitimate reasons to throw output in some other encoding. I think it would be nice to have an option somewhere in project settings for default project encoding and use it for output and input.
comment:7 by , 19 years ago
Here's the better one (440.2.patch):
- adds DEFAULT_CHARSET to project settings template
- for compatibility with current projects if the parameter is not set fills it with 'utf-8'
- uses this setting in text fields decoding
- uses this setting in httpwrappers.HttpResponse
The last point means that this patch also fixes task 333.
comment:8 by , 19 years ago
I made a mistake in a 440.2.patch with always appending charset to the mimetype. This will break if the user supplies its own mimetype. Hence the correction in 440.3.patch: append charset only to default mimetype.
P.S. How do you mark patches obsolete in 'Trac'? Can anyone mark 440.2.patch as such or delete it whatsoever?
comment:9 by , 19 years ago
BTW, component 'Admin interface' seems not right. It's more like 'Validators' or even 'Core' I think...
comment:10 by , 19 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
A patch would be great!