Opened 10 years ago

Closed 10 years ago

Last modified 8 years ago

#440 closed defect (fixed)

[patch] maxlength incorrectly checked for international characters in utf-8

Reported by: Maniac@… Owned by: adrian
Component: contrib.admin Version:
Severity: normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

In django.core.formfields TextField.isValidLength checks length very naively:

if data and self.maxlength and len(data) > self.maxlength:

Since sane applications tend to use utf-8 for web output 'data' passed to this method is also in utf-8. This means that len(data) actually gives more bytes than characters for any non-ascii input.

Attachments (3)

440.1.patch (615 bytes) - added by Maniac@… 10 years ago.
patch for decoding data utf-8 -> unicode
440.2.patch (2.5 KB) - added by Maniac@… 10 years ago.
Patch with parametrized charset
440.3.patch (2.7 KB) - added by Maniac@… 10 years ago.
Patch with parametrized charset (corrected)

Download all attachments as: .zip

Change History (14)

comment:1 Changed 10 years ago by adrian

A patch would be great!

comment:2 Changed 10 years ago by anonymous

Hm... any pointer where to get the charset for output? Or Django is always in utf-8?

Also, I'm not (yet :-) ) familiar with the code so I don't know where else similar things can bite... I'd rather just unicode() any string data from web before processing but I have no idea if it would be feasible.

comment:3 Changed 10 years ago by hugo <gb@…>

I think a len(data.decode('utf-8')) should give the correct string length of the input - django should allways use utf-8 coded output and so receive utf-8 coded input.

if data and self.maxlength and len(data.decode('utf-8')) > self.maxlength:

Changed 10 years ago by Maniac@…

patch for decoding data utf-8 -> unicode

comment:4 Changed 10 years ago by Maniac@…

Yeah, I searched the code and it seems that 'utf-8' is hardcoded everwhere. Hence the patch as Hugo suggested.

comment:5 Changed 10 years ago by hugo <gb@…>

  • Summary changed from maxlength incorrectly checked for international characters in utf-8 to [patch] maxlength incorrectly checked for international characters in utf-8

comment:6 Changed 10 years ago by anonymous

This hardcoded 'utf-8' still bugs me. Though I understand that it's the best practice but the user may have legitimate reasons to throw output in some other encoding. I think it would be nice to have an option somewhere in project settings for default project encoding and use it for output and input.

Changed 10 years ago by Maniac@…

Patch with parametrized charset

comment:7 Changed 10 years ago by anonymous

Here's the better one (440.2.patch):

  • adds DEFAULT_CHARSET to project settings template
  • for compatibility with current projects if the parameter is not set fills it with 'utf-8'
  • uses this setting in text fields decoding
  • uses this setting in httpwrappers.HttpResponse

The last point means that this patch also fixes task 333.

Changed 10 years ago by Maniac@…

Patch with parametrized charset (corrected)

comment:8 Changed 10 years ago by Maniac@…

I made a mistake in a 440.2.patch with always appending charset to the mimetype. This will break if the user supplies its own mimetype. Hence the correction in 440.3.patch: append charset only to default mimetype.

P.S. How do you mark patches obsolete in 'Trac'? Can anyone mark 440.2.patch as such or delete it whatsoever?

comment:9 Changed 10 years ago by Maniac@…

BTW, component 'Admin interface' seems not right. It's more like 'Validators' or even 'Core' I think...

comment:10 Changed 10 years ago by adrian

  • Resolution set to fixed
  • Status changed from new to closed

(In [786]) Fixed #333 and #440 -- Split DEFAULT_MIME_TYPE setting into DEFAULT_CONTENT_TYPE and DEFAULT_CHARSET. Thanks, Maniac.

comment:11 Changed 8 years ago by anonymous

  • milestone Version 1.0 deleted

Milestone Version 1.0 deleted

Note: See TracTickets for help on using tickets.
Back to Top