Opened 7 years ago
Last modified 7 years ago
#28796 closed Cleanup/optimization
reverse() coerces bytes in args/kwargs to str which adds "b" prefixes to the ouput — at Version 7
Reported by: | Nick | Owned by: | nobody |
---|---|---|---|
Component: | Documentation | Version: | 2.0 |
Severity: | Normal | Keywords: | |
Cc: | nathompson7@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Suppose we have:
import hmac from hashlib import sha384 from django.utils.http import urlsafe_base64_encode hashed_msg = hmac.new(b"secret", msg=b"qwerty", digestmod=sha384) url = urlsafe_base64_encode(hashed_msg.digest())
Now url is a bytes object, say b'asdfwerqwefa'. In django 1.11, bytes-like objects could be used in reverse()
, creating (say) localhost:8000/asdfwerqwefa
However, in Django 2.0, the url becomes (say)
localhost:8000/b'asdfwerqwefa'
which seems less 'urlsafe', as the quotes and the 'b' are added into the url. Shouldn't urlsafe_base64_encode now return a string, if bytes in urls now retain the quotes?
I have created a full project which reproduces the bug here:
https://github.com/NAThompson/urlsafe_bug
To reproduce the bug, type:
$ ./manage.py shell >>> import reproduce
This bug is present in commit d90936f41a2d7a3361e51fc49be033ba0f05f458, but much before that the 'django.urls.path' doesn't exist, so bisection doesn't work cleanly.
Change History (7)
comment:1 by , 7 years ago
Component: | Uncategorized → Utilities |
---|---|
Summary: | urlsafe_base64_encode broken on Django 2.0 → django.utils.http.urlsafe_base64_encode() broken in Django 2.0 |
Type: | Uncategorized → Bug |
comment:2 by , 7 years ago
I would be all for making urlsafe_base64_encode
return a string, but it would be plainly backwards incompatible, as much existing code is already doing a decode()
on the result of this function. See usage in django/contrib/auth/forms.py
for example.
comment:3 by , 7 years ago
Description: | modified (diff) |
---|
comment:4 by , 7 years ago
Description: | modified (diff) |
---|
comment:5 by , 7 years ago
Component: | Utilities → Core (URLs) |
---|---|
Summary: | django.utils.http.urlsafe_base64_encode() broken in Django 2.0 → reverse() coerces bytes in args/kwargs to str which adds "b" prefixes to the ouput |
I completed the bisect by replacing path(...)
with url('(?P<token>.*)', ...
in the sample project. The commit where the behavior changed is 301de774c21d055e9e5a7073e5bffdb52bc71079.
The change in behavior is in reverse()
. Args and kwargs have str()
applied rather than force_text()
which adds the b prefix.
https://github.com/django/django/commit/301de774c21d055e9e5a7073e5bffdb52bc71079#diff-b2f90a810a553411112f412de2a26617R424
Do you consider that a bug, Claude?
comment:6 by , 7 years ago
I would say that reverse
is not supposed to handle arbitrary bytestrings, so generally speaking, it's developer's role to decode any bytestring to a string (e.g. force_text(b'caf\xe9')
will still produce a DjangoUnicodeDecodeError
).
Now I could understand that we privilege backwards compatibility and still accept ASCII bytesstrings by reintroducing force_text
(at the expense of a slight overhead). It's some sort of design choice.
comment:7 by , 7 years ago
Description: | modified (diff) |
---|
I think we could document the backwards incompatibility and point out the solution to add .decode()
, which should also work without older versions of Django. Does it seem okay, Nick?
Could you please edit the description to include a complete code snippet that reproduces the problem?
Ideally, you could also bisect to find the commit where the behavior changed.