Opened 10 years ago

Closed 10 years ago

#22721 closed Bug (wontfix)

Fallback encoding support on request.GET required for MSIE

Reported by: hoha@… Owned by: nobody
Component: HTTP handling Version: 1.7-beta-2
Severity: Normal Keywords:
Cc: linovia, kevin@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Originally described at https://github.com/tomchristie/django-rest-framework/pull/1590 before I realised this was a Django issue.

While testing a new system in pre-production we found a bug when MSIE was being used; basically, the way MSIE handles query strings are interesting, to say the least. System locale can have a say in how it's being sent in the request, previous website's encoding can have a say, heck, even from where you launch the browser session.

In our testing, MSIE did not urlquote the querystring, but instead sent it in what looked like raw latin1. The problem becomes, that if you access this query string through request.GET in a Django view, you can an encoding error, if you happen to be using Python 3.x. From my findings, this was introduced in Django 1.6, and worked correctly under Django 1.5 (at least according to the PR I did for Django Rest Framework showcasing this bug).

It seems to me that Django should have some fallback encoding support for this - even if we ignore the fact that MSIE really should be urlquoting the querystring.

Could it be that Django 1.6 somehow introduced a regression over Django 1.5?

Attachments (1)

fallback_encoding_msie.diff (2.2 KB ) - added by Henrik Ossipoff Hansen 10 years ago.
Patch for test case show-casing the MSIE behaviour under certain conditions

Download all attachments as: .zip

Change History (9)

comment:1 by Tim Graham, 10 years ago

Any chance you can write a test case for our test suite and bisect when the behavior changed? That would be really helpful.

comment:2 by linovia, 10 years ago

Cc: linovia added

by Henrik Ossipoff Hansen, 10 years ago

Attachment: fallback_encoding_msie.diff added

Patch for test case show-casing the MSIE behaviour under certain conditions

comment:3 by Henrik Ossipoff Hansen, 10 years ago

After some work I managed to write a test case show-casing the behaviour generated by MSIE in certain conditions, when using Django 1.6+ and Python3.

I've bisected the actual commit where the changed occurred: https://github.com/django/django/commit/7fcd6aa6695b39370154d6993cdbb3ba4363de91

comment:4 by Aymeric Augustin, 10 years ago

If I understand your report correctly, MSIE can send non-ASCII query strings in a variety of encodings, depending on several factors.

Currently Django assumes UTF-8 for any non-ASCII data found in the query string. settings.DEFAULT_CHARSET could be an improvement.

But that doesn't address your problem. How do you propose to determine the appropriate encoding?

Last edited 10 years ago by Aymeric Augustin (previous) (diff)

comment:5 by Kevin Brown, 10 years ago

Cc: kevin@… added

comment:6 by Henrik Ossipoff Hansen, 10 years ago

Right, my previous report might actually have been a bit wrong - I don't think it's a case of MSIE sending the wrong encoding (in my case at least). I'm not very well-versed in encodings, but from what I see is happening, and what my attached test demonstrates is:

  • MSIE will in some cases not properly urlencode a query string, meaning that if I for example have a URL of /?q=æøå, then raw "æøå" will actually be passed on in the request (in what I think is actually, at least in my case, a unicode string)
  • If this happens, Django throws up (as per my attached tests) - this happened after the mentioned commit during Django 1.6 betas, and only if using Python 3. Django will try to encode that unicode string as iso-8859-1, then decode it to UTF-8. This works if the browser contained properly urlencoded query strings, but not if the browser sent the raw unicode string.
Last edited 10 years ago by Henrik Ossipoff Hansen (previous) (diff)

comment:7 by Aymeric Augustin, 10 years ago

Indeed, the browser should urlencode the data so that the query string only contains ASCII data. Then any ASCII-compatible encoding, like utf-8, can be used to encode or decode it. Here MSIE doesn't urlencode. So we need a way to determine which encoding it used in order to decode the data properly.

There's no such thing as a "raw Unicode string". Unicode is an abstract representation. You can't send Unicode over the network. The browser sends a bunch of bytes on the wire, encoded in a given encoding.

Then the WSGI server arbitrarily decodes these bytes with the latin-1 encoding (everyone knows that this part of WSGI on Python 3 is ridiculous.) Django attempts to be less stupid by reencoding with the latin-1 encoding, recovering the original bytes sent by the browser, and decoding with an appropriate charset, currently hard coded to utf-8.

Ignore the latin-1 (= ISO-8859-1) entirely, it's required to work around WSGI and I swear it's correct. What we need is a way to determine the appropriate charset.

comment:8 by Aymeric Augustin, 10 years ago

Resolution: wontfix
Status: newclosed

I'm going to close this ticket because we don't know what we could do to work around this bug.

Please reopen if you can suggest a better algorithm for selecting an appropriate charset.

Note: See TracTickets for help on using tickets.
Back to Top