Opened 19 years ago

Closed 18 years ago

Last modified 17 years ago

#440 closed defect (fixed)

[patch] maxlength incorrectly checked for international characters in utf-8

Reported by: Maniac@… Owned by: Adrian Holovaty
Component: contrib.admin Version:
Severity: normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In django.core.formfields TextField.isValidLength checks length very naively:

if data and self.maxlength and len(data) > self.maxlength:

Since sane applications tend to use utf-8 for web output 'data' passed to this method is also in utf-8. This means that len(data) actually gives more bytes than characters for any non-ascii input.

Attachments (3)

440.1.patch (615 bytes ) - added by Maniac@… 19 years ago.
patch for decoding data utf-8 -> unicode
440.2.patch (2.5 KB ) - added by Maniac@… 19 years ago.
Patch with parametrized charset
440.3.patch (2.7 KB ) - added by Maniac@… 19 years ago.
Patch with parametrized charset (corrected)

Download all attachments as: .zip

Change History (14)

comment:1 by Adrian Holovaty, 19 years ago

A patch would be great!

comment:2 by anonymous, 19 years ago

Hm... any pointer where to get the charset for output? Or Django is always in utf-8?

Also, I'm not (yet :-) ) familiar with the code so I don't know where else similar things can bite... I'd rather just unicode() any string data from web before processing but I have no idea if it would be feasible.

comment:3 by hugo <gb@…>, 19 years ago

I think a len(data.decode('utf-8')) should give the correct string length of the input - django should allways use utf-8 coded output and so receive utf-8 coded input.

if data and self.maxlength and len(data.decode('utf-8')) > self.maxlength:

by Maniac@…, 19 years ago

Attachment: 440.1.patch added

patch for decoding data utf-8 -> unicode

comment:4 by Maniac@…, 19 years ago

Yeah, I searched the code and it seems that 'utf-8' is hardcoded everwhere. Hence the patch as Hugo suggested.

comment:5 by hugo <gb@…>, 19 years ago

Summary: maxlength incorrectly checked for international characters in utf-8[patch] maxlength incorrectly checked for international characters in utf-8

comment:6 by anonymous, 19 years ago

This hardcoded 'utf-8' still bugs me. Though I understand that it's the best practice but the user may have legitimate reasons to throw output in some other encoding. I think it would be nice to have an option somewhere in project settings for default project encoding and use it for output and input.

by Maniac@…, 19 years ago

Attachment: 440.2.patch added

Patch with parametrized charset

comment:7 by anonymous, 19 years ago

Here's the better one (440.2.patch):

  • adds DEFAULT_CHARSET to project settings template
  • for compatibility with current projects if the parameter is not set fills it with 'utf-8'
  • uses this setting in text fields decoding
  • uses this setting in httpwrappers.HttpResponse

The last point means that this patch also fixes task 333.

by Maniac@…, 19 years ago

Attachment: 440.3.patch added

Patch with parametrized charset (corrected)

comment:8 by Maniac@…, 19 years ago

I made a mistake in a 440.2.patch with always appending charset to the mimetype. This will break if the user supplies its own mimetype. Hence the correction in 440.3.patch: append charset only to default mimetype.

P.S. How do you mark patches obsolete in 'Trac'? Can anyone mark 440.2.patch as such or delete it whatsoever?

comment:9 by Maniac@…, 19 years ago

BTW, component 'Admin interface' seems not right. It's more like 'Validators' or even 'Core' I think...

comment:10 by Adrian Holovaty, 18 years ago

Resolution: fixed
Status: newclosed

(In [786]) Fixed #333 and #440 -- Split DEFAULT_MIME_TYPE setting into DEFAULT_CONTENT_TYPE and DEFAULT_CHARSET. Thanks, Maniac.

comment:11 by (none), 17 years ago

milestone: Version 1.0

Milestone Version 1.0 deleted

Note: See TracTickets for help on using tickets.
Back to Top