Django

Code

Ticket #5738 (closed: fixed)

Opened 9 months ago

Last modified 9 months ago

django fails on defective unicode strings appearing in the url

Reported by: Soeren Sonnenburg <bugreports@nn7.de> Assigned to: nobody
Milestone: Component: HTTP handling
Version: SVN Keywords:
Cc: Triage Stage: Accepted
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 1

Description

problem happens with any django site (version does not matter),

the best backtrace one can get here :-)

http://www.djangoproject.com/~%A9

Attachments

unicode_url_bug.patch (1.1 kB) - added by Armin Ronacher on 10/12/07 16:57:58.
fix

Change History

10/11/07 10:41:08 changed by Armin Ronacher

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

That's a quite annoying thing. Especially because it happens outside the debugging system so this could expose internal information in the mod_python / flup traceback. The fix would be using an 'ignore' or 'replace' fallback in the unicode conversion.

10/12/07 16:57:58 changed by Armin Ronacher

  • attachment unicode_url_bug.patch added.

fix

10/12/07 16:58:57 changed by anonymous

  • has_patch set to 1.

10/12/07 19:45:42 changed by ubernostrum

  • component changed from Core framework to HTTP handling.
  • stage changed from Unreviewed to Accepted.

10/12/07 21:57:57 changed by adrian

  • status changed from new to closed.
  • resolution set to fixed.

(In [6475]) Fixed #5738 -- Fixed bug with defective Unicode strings in a URL

10/15/07 02:29:33 changed by Soeren Sonnenburg <bugreports@nn7.de>

  • status changed from closed to reopened.
  • needs_better_patch set to 1.
  • resolution deleted.

I am not sure whether that was the correct fix, because now things like

http://www.djangoproject.com/d%aao%aaw%aan%aal%aao%aaa%aad%aa/

work too...

10/15/07 02:35:26 changed by mtredinnick

  • status changed from reopened to closed.
  • resolution set to fixed.

It isn't actually a bug that that example works. It's harmless. We can live with this fix.

10/15/07 02:39:19 changed by mtredinnick

As an addendum to the previous comment (I hit "post" too fast), the alternative is to automatically return an HTTP 400 status code in this case. But I think what we're doing is a reasonable approach to the problem.

10/15/07 03:13:39 changed by Soeren Sonnenburg <bugreports@nn7.de>

  • status changed from closed to reopened.
  • resolution deleted.

that is exactly what I would have expected - a 404 page...

10/15/07 10:54:35 changed by ubernostrum

You absolutely should not get a 404 from that. If you think that the request is bad, the correct status is "HTTP 400 Bad Request".

10/20/07 02:40:57 changed by mtredinnick

Thinking about this a lot more, I'm not totally happy with the fix in [6475], but it's a line-ball a bit. The problem is that although UTF-8 is strongly recommended as the encoding for non-ASCII data, it's not actually codified in any spec until quite recently (RFC 2396 leaves things wide open, for example). Only in RFC 3986 were things made clear for IRI to URI encoding.

In the interim, systems were deployed that spit out non-UTF-8 encoded URIs.

So I'm going to commit a change that passes back a 400 response for malformed input (non-UTF-8) but also makes it easier to override the request class, so if somebody is dealing with a legacy system, they can subclass WSGIRequestor or ModPythonRequest to handle decoding the URI however they need to.

10/20/07 02:48:19 changed by mtredinnick

  • status changed from reopened to closed.
  • resolution set to fixed.

For some reason, the auto-closer didn't trigger. [6550] has the latest commit for this ticket.


Add/Change #5738 (django fails on defective unicode strings appearing in the url)




Change Properties
Action