Opened 8 years ago
Last modified 8 years ago
#28796 closed Cleanup/optimization
reverse() coerces bytes in args/kwargs to str which adds "b" prefixes to the ouput — at Version 7
| Reported by: | Nick | Owned by: | nobody |
|---|---|---|---|
| Component: | Documentation | Version: | 2.0 |
| Severity: | Normal | Keywords: | |
| Cc: | nathompson7@… | Triage Stage: | Accepted |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Suppose we have:
import hmac from hashlib import sha384 from django.utils.http import urlsafe_base64_encode hashed_msg = hmac.new(b"secret", msg=b"qwerty", digestmod=sha384) url = urlsafe_base64_encode(hashed_msg.digest())
Now url is a bytes object, say b'asdfwerqwefa'. In django 1.11, bytes-like objects could be used in reverse(), creating (say) localhost:8000/asdfwerqwefa
However, in Django 2.0, the url becomes (say)
localhost:8000/b'asdfwerqwefa'
which seems less 'urlsafe', as the quotes and the 'b' are added into the url. Shouldn't urlsafe_base64_encode now return a string, if bytes in urls now retain the quotes?
I have created a full project which reproduces the bug here:
https://github.com/NAThompson/urlsafe_bug
To reproduce the bug, type:
$ ./manage.py shell >>> import reproduce
This bug is present in commit d90936f41a2d7a3361e51fc49be033ba0f05f458, but much before that the 'django.urls.path' doesn't exist, so bisection doesn't work cleanly.
Change History (7)
comment:1 by , 8 years ago
| Component: | Uncategorized → Utilities |
|---|---|
| Summary: | urlsafe_base64_encode broken on Django 2.0 → django.utils.http.urlsafe_base64_encode() broken in Django 2.0 |
| Type: | Uncategorized → Bug |
comment:2 by , 8 years ago
I would be all for making urlsafe_base64_encode return a string, but it would be plainly backwards incompatible, as much existing code is already doing a decode() on the result of this function. See usage in django/contrib/auth/forms.py for example.
comment:3 by , 8 years ago
| Description: | modified (diff) |
|---|
comment:4 by , 8 years ago
| Description: | modified (diff) |
|---|
comment:5 by , 8 years ago
| Component: | Utilities → Core (URLs) |
|---|---|
| Summary: | django.utils.http.urlsafe_base64_encode() broken in Django 2.0 → reverse() coerces bytes in args/kwargs to str which adds "b" prefixes to the ouput |
I completed the bisect by replacing path(...) with url('(?P<token>.*)', ... in the sample project. The commit where the behavior changed is 301de774c21d055e9e5a7073e5bffdb52bc71079.
The change in behavior is in reverse(). Args and kwargs have str() applied rather than force_text() which adds the b prefix.
https://github.com/django/django/commit/301de774c21d055e9e5a7073e5bffdb52bc71079#diff-b2f90a810a553411112f412de2a26617R424
Do you consider that a bug, Claude?
comment:6 by , 8 years ago
I would say that reverse is not supposed to handle arbitrary bytestrings, so generally speaking, it's developer's role to decode any bytestring to a string (e.g. force_text(b'caf\xe9') will still produce a DjangoUnicodeDecodeError).
Now I could understand that we privilege backwards compatibility and still accept ASCII bytesstrings by reintroducing force_text (at the expense of a slight overhead). It's some sort of design choice.
comment:7 by , 8 years ago
| Description: | modified (diff) |
|---|
I think we could document the backwards incompatibility and point out the solution to add .decode(), which should also work without older versions of Django. Does it seem okay, Nick?
Could you please edit the description to include a complete code snippet that reproduces the problem?
Ideally, you could also bisect to find the commit where the behavior changed.