#31344 closed Bug (needsinfo)
Django raises UnicodeEncodeError when there is a cookie with a non-latin character.
Reported by: | Ozgur Akcali | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | 2.2 |
Severity: | Normal | Keywords: | cookie |
Cc: | Florian Apolloner | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I know non-latin characters are not suggested to be used in cookies, but when one such cookie is sent with a request, django raises a UnicodeEncodeError. It is raised on get_bytes_from_wsgi method of wsgi.py, on the following line:
return value.encode('iso-8859-1')
Not sure how this should be handled, 'ignore' could bu supplied as the second parameter to encode method, but that would change the value of the cookie silently, and I'm not sure if that would be a desired behavior.
Change History (2)
comment:1 by , 5 years ago
Cc: | added |
---|---|
Resolution: | → needsinfo |
Status: | new → closed |
Summary: | Django raises UnicodeEncodeError when there is a cookie with a non-latin character → Django raises UnicodeEncodeError when there is a cookie with a non-latin character. |
comment:2 by , 5 years ago
I am with Mariusz on this one. If your environment contains data that is not encodable to iso-8859-1 then you have an app-server that doesn't implement WSGI correctly.
Not sure how this should be handled, 'ignore' could bu supplied as the second parameter to encode method, but that would change the value of the cookie silently, and I'm not sure if that would be a desired behavior.
The value of the cookie is already changed silently, if you were to send "name=öäü".encode('iso-8859-1') as literal cookie value over the wire and execute the following view:
from django.core.handlers.wsgi import get_bytes_from_wsgi print(get_bytes_from_wsgi(request.environ, "HTTP_COOKIE", "")) print(request.COOKIES["name"])
you'd get:
b'name=\xf6\xe4\xfc' ���
As you can see the "bytes" in the raw environment are reproducable correctly, but django later on converts to a string using utf-8 with replace
. Even if you had any other non-latin character the actual byte sequence would be correct. We'd need a full traceback and reproducer like Mariusz said.
Non-ASCII values in the WSGI environ are arbitrarily decoded with ISO-8859-1, that's why Django uses this encoding (see also PEP 333). You shouldn't get a value in other encodings. Please feel-free to reopen this ticket if you can provide a sample project to reproduce yours issue.