Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#31344 closed Bug (needsinfo)

Django raises UnicodeEncodeError when there is a cookie with a non-latin character.

Reported by: Ozgur Akcali Owned by: nobody
Component: HTTP handling Version: 2.2
Severity: Normal Keywords: cookie
Cc: Florian Apolloner Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I know non-latin characters are not suggested to be used in cookies, but when one such cookie is sent with a request, django raises a UnicodeEncodeError. It is raised on get_bytes_from_wsgi method of wsgi.py, on the following line:

return value.encode('iso-8859-1')

Not sure how this should be handled, 'ignore' could bu supplied as the second parameter to encode method, but that would change the value of the cookie silently, and I'm not sure if that would be a desired behavior.

Change History (2)

comment:1 by Mariusz Felisiak, 4 years ago

Cc: Florian Apolloner added
Resolution: needsinfo
Status: newclosed
Summary: Django raises UnicodeEncodeError when there is a cookie with a non-latin characterDjango raises UnicodeEncodeError when there is a cookie with a non-latin character.

Non-ASCII values in the WSGI environ are arbitrarily decoded with ISO-8859-1, that's why Django uses this encoding (see also PEP 333). You shouldn't get a value in other encodings. Please feel-free to reopen this ticket if you can provide a sample project to reproduce yours issue.

comment:2 by Florian Apolloner, 4 years ago

I am with Mariusz on this one. If your environment contains data that is not encodable to iso-8859-1 then you have an app-server that doesn't implement WSGI correctly.

Not sure how this should be handled, 'ignore' could bu supplied as the second parameter to encode method, but that would change the value of the cookie silently, and I'm not sure if that would be a desired behavior.

The value of the cookie is already changed silently, if you were to send "name=öäü".encode('iso-8859-1') as literal cookie value over the wire and execute the following view:

    from django.core.handlers.wsgi import get_bytes_from_wsgi

    print(get_bytes_from_wsgi(request.environ, "HTTP_COOKIE", ""))
    print(request.COOKIES["name"])

you'd get:

b'name=\xf6\xe4\xfc'
���

As you can see the "bytes" in the raw environment are reproducable correctly, but django later on converts to a string using utf-8 with replace. Even if you had any other non-latin character the actual byte sequence would be correct. We'd need a full traceback and reproducer like Mariusz said.

Note: See TracTickets for help on using tickets.
Back to Top