Context Navigation

← Previous Ticket
Next Ticket →

#5076 closed Cleanup/optimization (fixed)

Request object should try to determine encoding

Reported by:	dbr <daniel@…>	Owned by:	Claude Paroz <claude@…>
Component:	HTTP handling	Version:	dev
Severity:	Normal	Keywords:	unicode conversion charset POST request
Cc:		Triage Stage:	Ready for checkin
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

(As discussed here on django-developers.)

When recieving POST-data it is not certain that the charset is declared
in the Content-Type header. This leads to a situation where the POST data, when it is converted to
unicode, will have all non-ascii characters translated to \ufffd (the
codepoint for an unknown character). Probably desireable if the original data
was, for instance, latin-1 encoded.

Attachments (3)

django_request_charset.diff (2.6 KB ) - added by dbr <daniel@…> 18 years ago.: A first try at a fix
django_request_charset.2.diff (2.6 KB ) - added by dbr <daniel@…> 18 years ago.
5076-3.diff (2.9 KB ) - added by Claude Paroz 12 years ago.: Get request encoding from content-type header

Download all attachments as: .zip

Change History (19)

by dbr <daniel@…>, 18 years ago

Attachment:	django_request_charset.diff added

A first try at a fix

comment:1 by dbr <daniel@…>, 18 years ago

This was a great deal more challenging than I thought it would be, mostly due to the fact that I don't have much knowledge of django's internals. Please feel free to poke fun at my code (or at least suggest a better solution).

A few things to note for this patch:

Since I don't use mod_python, I have not made any changes to the mod_python request object since I would not be able to test it
I'd rather see the detect_charset() code executed by init() in the base HttpRequest-class, but since you override it i figured you do that for a reason.
If the encoding is determined in the QuerySet class (the try-except-clause in the patch), the request object will never now. Not sure this matters though.

.. there is probably more issues with my approach, so let's continue the discussion..

comment:2 by Simon G. <dev@…>, 18 years ago

Patch needs improvement:	set
Triage Stage:	Unreviewed → Accepted

by dbr <daniel@…>, 18 years ago

Attachment:	django_request_charset.2.diff added

comment:3 by dbr <daniel@…>, 18 years ago

Just added one line to the previous diff.

I was thinking maybe to skip the detect_charset() code entirely and just rely on setting the encoding when the QuerySet is created, not sure what makes most sense here..

comment:4 by Piotr Lewandowski <django@…>, 17 years ago

Component:	Uncategorized → HTTP handling

comment:5 by ahlongxp, 16 years ago

milestone:	→ post-1.0

force_unicode(key, encoding, errors='replace')

is depressing.
It'd better if we can choose whether to throw exceptions

comment:6 by (none), 16 years ago

milestone:	post-1.0

Milestone post-1.0 deleted

comment:7 by ccahoon, 16 years ago

Owner:	changed from nobody to ccahoon

comment:8 by anonymous, 16 years ago

Owner:	ccahoon removed

comment:9 by Gabriel Hurley, 14 years ago

Severity:	→ Normal
Type:	→ Cleanup/optimization

comment:10 by Aymeric Augustin, 13 years ago

UI/UX:	unset

Change UI/UX from NULL to False.

comment:11 by Aymeric Augustin, 13 years ago

Easy pickings:	unset

Change Easy pickings from NULL to False.

comment:12 by Aymeric Augustin, 12 years ago

The patch for #19101 includes the fix for this ticket. However, I'd prefer to fix these issues separately, so I'm going to comment here.

I'd like to see a more robust media type parsing. The grammar is simple enough to implement correctly. For example:

content_type = self.META.get('CONTENT_TYPE', '')
content_type, _, parameters = content_type.partition(';')
content_params = {}
for parameter in parameters.split(';'):
    k, _, v = parameter.strip().partition('=')
    content_params[k] = v

It could make sense to move this to a utility function or at least as a method of the HttpRequest object, as in the patches above.

Then we just do:

if 'charset' in content_params:
    ...

by Claude Paroz, 12 years ago

Attachment:	5076-3.diff added

Get request encoding from content-type header

comment:13 by Claude Paroz, 12 years ago

Patch needs improvement:	unset

Patch separated from the one in #19101 and adapted following comment:12.

comment:14 by Aymeric Augustin, 12 years ago

Triage Stage:	Accepted → Ready for checkin

The patch looks good to me.

Could you just mention the source for the algorithm in the docstring?

def _parse_content_type(self, ctype):
    """Media Types parsing according to RFC 2616, section 3.7.

    Returns the data type and parameters. For example:
    Input: "text/plain; charset=iso-8859-1"
    Output: ('text/plain', {'charset': 'iso-8859-1'})
    """

comment:15 by Claude Paroz <claude@…>, 12 years ago

Owner:	set to Claude Paroz <claude@…>
Resolution:	→ fixed
Status:	new → closed

In 6de6988f9990b4b53f5a20bfc3811b3cc49291c5:

Fixed #5076 -- Properly decode POSTs with non-utf-8 payload encoding

Thanks daniel at blogg.se for the report and Aymeric Augustin for
his assistance on the patch.

comment:16 by Claude Paroz <claude@…>, 12 years ago

In 3f3076edbf8d6fb204984a1a7fddbde408d5b104:

[1.5.x] Fixed #5076 -- Properly decode POSTs with non-utf-8 payload encoding

Thanks daniel at blogg.se for the report and Aymeric Augustin for
his assistance on the patch.

Backport of 6de6988f9 from master.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#5076 closed Cleanup/optimization (fixed)

Request object should try to determine encoding

Description

Attachments (3)

Change History (19)

by dbr <daniel@…>, 18 years ago

comment:1 by dbr <daniel@…>, 18 years ago

comment:2 by Simon G. <dev@…>, 18 years ago

by dbr <daniel@…>, 18 years ago

comment:3 by dbr <daniel@…>, 18 years ago

comment:4 by Piotr Lewandowski <django@…>, 17 years ago

comment:5 by ahlongxp, 16 years ago

comment:6 by (none), 16 years ago

comment:7 by ccahoon, 16 years ago

comment:8 by anonymous, 16 years ago

comment:9 by Gabriel Hurley, 14 years ago

comment:10 by Aymeric Augustin, 13 years ago

comment:11 by Aymeric Augustin, 13 years ago

comment:12 by Aymeric Augustin, 12 years ago

by Claude Paroz, 12 years ago

comment:13 by Claude Paroz, 12 years ago

comment:14 by Aymeric Augustin, 12 years ago

comment:15 by Claude Paroz <claude@…>, 12 years ago

comment:16 by Claude Paroz <claude@…>, 12 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us