Django

Code

Ticket #11522 (closed: fixed)

Opened 1 year ago

Last modified 6 months ago

UnicodeEncodeError on redirect to non-ASCII Location

Reported by: semenov Assigned to: nobody
Milestone: 1.2 Component: HTTP handling
Version: SVN Keywords:
Cc: yoan@dosimple.ch, oldium.pro@seznam.cz Triage Stage: Accepted
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 1

Description

If a view for a URL which contains unicode characters returns a HttpResponseRedirect? to a non-absolute URL, django crashes.

Example,

def myview(request):
    print request.path # '/info/Интеграция_CMS/'
    return HttpResponseRedirect('?edit=1')

gives the following error message:

URI:            '/info/\xd0\x98\xd0\xbd\xd1\x82\xd0\xb5\xd0\xb3\xd1\x80\xd0\xb0\xd1\x86\xd0\xb8\xd1\x8f_CMS/'

...

Traceback (most recent call last):

  ...

  File "/usr/lib/python2.5/site-packages/django/http/utils.py", line 20, in fix_location_header
    response['Location'] = request.build_absolute_uri(response['Location'])

  File "/usr/lib/python2.5/site-packages/django/http/__init__.py", line 314, in __setitem__
    header, value = self._convert_to_ascii(header, value)

  File "/usr/lib/python2.5/site-packages/django/http/__init__.py", line 306, in _convert_to_ascii
    yield value.encode('us-ascii')

UnicodeEncodeError: 'ascii' codec can't encode characters in position 28-37: ordinal not in range(128), HTTP response headers must be in US-ASCII format

The cause of the problem is that #10267 was incorrectly fixed by [10539], leaving many other cases open. The call to iri_to_uri should be moved from HttpResponseRedirect??.init (and other places where it had been copy-pasted) deeper to django.utils.http.fix_location_header.

Apart of my problem, it was reported in #10267 that django.views.generic.simple.redirect_to and django.contrib.redirects also suffer from the similar encoding issue, which would also be resolved once fix_location_header is fixed appropriately.

For those Google strangers who are interested in a workaround, you can do as follows:

def myview(request):
    print request.path # '/info/Интеграция_CMS/'
    return HttpResponseRedirect('?edit=1') # crashes
    return HttpResponseRedirect(request.path + '?edit=1') # doesn't crash

Attachments

simple_fix.diff (480 bytes) - added by semenov on 08/06/09 07:38:32.
added a special rule in HttpResponse.setitem
ticket11522_puny_code.patch (2.5 kB) - added by yoan@dosimple.ch on 08/12/09 14:44:09.
Patch to support IRI in redirections.

Change History

08/05/09 20:40:57 changed by kmtracey

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

#11638 closed as a dup.

08/05/09 23:56:03 changed by BlindHunter

Ok, 11638 was closed. It's not necessarily relative path. Imagine what would be if you redirect to a file in russian utf-8:

# -*- coding: utf-8 -*- path = '/var/www/filez/site_media/upload/alecs/отпуск-2009-ОРС.pdf'.decode('utf8') return HttpResponseRedirect?(path)

UnicodeEncodeError? at /file/365d13241be1e1a3ac00523a5deddec4f577e01f813ded539bcbac32

('ascii', u'/var/www/filez/site_media/upload/alecs/\u043e\u0442\u043f\u0443\u0441\u043a-2009-\u041e\u0420\u0421.pdf', 39, 45, 'ordinal not in range(128)')

08/06/09 06:33:33 changed by semenov

BlindHunter?, would you mind using the wiki formatting - and, what is more important, the "Preview" button? Your text is hardly readable and if I were a Django developer, I would just disregard it.

08/06/09 07:32:49 changed by BlindHunter

It seems to me that you well-formed message is disregarded for more than two weeks ;) All the peculiarities and moments are important for a real developer. I'll try to use wikiformatting in my future posts.

08/06/09 07:37:38 changed by semenov

Update on this ticket:

1) The proposed workaround does not work anymore in Django 1.1. Use the following:

def myview(request):
    print request.path # '/info/Интеграция/'
    return HttpResponseRedirect('?edit=1') # crashes
    return HttpResponseRedirect(request.path + '?edit=1') # still crashes
    return HttpResponseRedirect(request.build_absolute_uri('?edit=1')) # doesn't crash, but violates DRY principle

2) I disregard my words about "moving the call to iri_to_uri blah-blah-blah..", they don't make sense, as that's what [10539] was actually about.

3) Still, I consider [10539] to be the cause of the problem. I can see two ways to fix the issue:

a) a special case in HttpResponse.setitem to handle 'Location' header properly (attached)
b) practically revert [10539], but instead of calling iri_to_uri in each and every case (as it was before [10539]), call fix_location_header instead. That is worse, as fix_location_header will be called twice then.

08/06/09 07:38:32 changed by semenov

  • attachment simple_fix.diff added.

added a special rule in HttpResponse.setitem

08/06/09 07:40:05 changed by semenov

Perhaps (a) can be improved to apply fix_location_header logic, and then we'll don't need the fix_location_header middleware at all.

08/06/09 07:40:15 changed by semenov

  • needs_better_patch set to 1.

08/06/09 07:40:24 changed by semenov

  • has_patch set to 1.

08/06/09 08:53:45 changed by BlindHunter

The problem is in value.encode('us-ascii') - this codec can't encode my url as it contains a file named in russian. A serious bug :( In order to skip encoding check you can override in HttpResponseRedirect? method _convert_to_ascii:

def _convert_to_ascii(self, *values):
        """Converts all values to ascii strings."""
        for value in values:
            if isinstance(value, unicode):
                try:
                    value = value.encode('us-ascii')
                except:
                    pass
            else:
                value = str(value)
            if '\n' in value or '\r' in value:
                raise BadHeaderError("Header values can't contain newlines (got %r)" % (value))
            yield value

08/06/09 21:27:56 changed by Alex

  • stage changed from Unreviewed to Accepted.

08/12/09 14:44:09 changed by yoan@dosimple.ch

  • attachment ticket11522_puny_code.patch added.

Patch to support IRI in redirections.

08/12/09 14:47:14 changed by yoan@dosimple.ch

A small patch of mine that enables doing redirection using unicode IRI like Mr. Snowman

>>> HttpResponseRedirect(u"http://müller.de/")
Location: http://xn--mller-kva.de/

I also think that URLField should accept such IRI, maybe we can create a new one and call it IRIField.

08/12/09 14:49:43 changed by anonymous

  • cc set to yoan@dosimple.ch.

08/13/09 06:06:53 changed by semenov

The problem is in value.encode('us-ascii')

The problem is NOT in value.encode. That is absolutely correct to encode the response headers in ASCII as required by RFC1915. The problem is that the URL in "Location" field is not automatically urlencoded, as anyone would expect.

09/21/09 22:09:40 changed by IanLewis

Unicode in the bug title should read non-ascii. Remember, it's possible to encode urls in other encodings besides Unicode.

09/22/09 06:57:08 changed by semenov

IanLewis?, you are mistaken. Unicode is a character set, not an encoding. That doesn't make sense to encode urls in ... Unicode. URLs can be encoded in UTF8, CP1251 or any other encoding which are all mappings from a character set (Unicode) to particular byte strings. (Getting into details, URLs are actually encoded twice -- first from Unicode to byte strings, then from byte strings to lower-ASCII strings using the %XX notation).

This ticket title mentions Unicode URLs and I consider that to be perfectly fine.

09/22/09 07:40:53 changed by kmtracey

  • summary changed from Crash on redirect to a relative URL if request.path is unicode to UnicodeEncodeError on redirect to non-ASCII Location.

09/22/09 07:47:39 changed by kmtracey

Closing #11921 as a dupe. Removed 'relative' from the summary since that was overly specific. Also, put non-ASCII in the summary since the problem is not use of Unicode but rather the presence of non-ASCII characters. A Unicode string with all ASCII chars works fine.

11/08/09 01:28:04 changed by oldium

My question here is whether the translation should be completely transparent, i.e.

return HttpResponseRedirect(u'/áíé/');
return HttpResponseRedirect(u'http://áíé/');

should just work (but I've read on Python pages that there are some problems in automatic conversion), or is it responsibility of the developer, so you have

return HttpResponseRedirect(iri_to_uri(u'/áíé/'));

Anyway, the Django methods should just work, so the following should work I think (at least the first case):

return HttpResponseRedirect(request.get_full_path());
return HttpResponseRedirect(request.path);

11/08/09 01:28:42 changed by oldium

  • cc changed from yoan@dosimple.ch to yoan@dosimple.ch, oldium.pro@seznam.cz.

11/08/09 23:26:39 changed by Natim

  • component changed from Uncategorized to HTTP handling.
  • milestone set to 1.2.

The ticket11522_puny_code.patch fixed the problem for me. Is it possible to apply this to the django trunk ?

Thank you.

12/13/09 13:02:21 changed by 235

simple patch worked in my case. looking forward to fix this

12/14/09 03:58:26 changed by etzel

Got this problem too, any fix would be welcome. I'm going with simple_fix.diff for now.

01/28/10 03:48:44 changed by jeremb

Same problem here, it would be welcome to merge the patch with trunk.

02/05/10 04:13:53 changed by anonymous

have this problem to, pls fix it

03/02/10 13:41:41 changed by kmtracey

  • status changed from new to closed.
  • resolution set to fixed.

(In [12660]) [1.1.X] Fixed #11522: Restored ability of http redirect responses to correctly handle redirect locations with non-ASCII chars.

r12659 from trunk.


Add/Change #11522 (UnicodeEncodeError on redirect to non-ASCII Location)




Change Properties
Action