#28679 closed Bug (fixed)
urlencode does not decode bytes before passing them to Python's urlencode
Reported by: | François Freitag | Owned by: | François Freitag |
---|---|---|---|
Component: | Utilities | Version: | 2.0 |
Severity: | Release blocker | Keywords: | urlencode bytes |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
urlencode
should decode bytes objects before to pass them to python's urlencode. Instead, it's currently calling str on the bytes, passing the bytes.__str__
representation of the bytes (b'...') to urlencode.
# A dictionary with bytes value result = http.urlencode({'a': b'abc'}, doseq=True) self.assertEqual(result, 'a=abc') # Test result on 2.0a1 AssertionError: 'a=b%27abc%27' != 'a=abc'
Change History (11)
comment:1 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 7 years ago
comment:3 by , 7 years ago
Severity: | Normal → Release blocker |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
Sure something needs to be fixed (sorry, I introduced the regression in [fee42fd99ee470528858c2ccb3621135c30ec262]).
I just wonder if we should use force_text
or rather let bytes be bytes, as the Python urlencode
seems to accept both strings or bytes.
follow-up: 6 comment:4 by , 7 years ago
What is the use case for passing bytes to urlencode()
?
As URLs are strings, I'd expect their query parameters to also be strings (or coercable to a string). At first glance, this feels like the bytes data is being passed through the system as if it were a string instead of being deocded/encoded at its endpoints.
comment:5 by , 7 years ago
Python is bytes differently than other types, perhaps Django should maintain that behaviour?
IMHO, decoding user-encoded bytes (assuming utf-8) and re-encoding is not useful and may cause harm to users who decided to encode their data using another encoding. I tend to agree with Claude on special-casing bytes.
comment:6 by , 7 years ago
Replying to Jon Dufresne:
What is the use case for passing bytes to
urlencode()
?
I had the same question initially. But then I saw that Python 3 urlencode
was also mentioning bytes
as possible input, so I guess there are valid use cases for that.
comment:7 by , 7 years ago
Python is bytes differently than other types, perhaps Django should maintain that behaviour?
I see. Is this the only motivation or is there a use case? While I might disagree with Python's choice here, I understand the argument of following upstream and the wider community.
IMHO, decoding user-encoded bytes (assuming utf-8) and re-encoding is not useful and may cause harm to users who decided to encode their data using another encoding. I tend to agree with Claude on special-casing bytes.
This is precisely what force_text() is doing. If decoding as utf-8 is causing harm, it hasn't been avoided.
By using force_text()
there is an implicit assumption that the bytes data is utf-8 encoded. While that may be true for a large majority of the cases, there is no reason to believe this is true for all cases. Having the bytes data properly decoded as it is received would avoid this assumption.
comment:8 by , 7 years ago
My use case is a library passing zipped xml, base64 encoded over GET parameters. The output of base64.b64encode
is bytes, and I think Django should not change that (even though it does not matter in my use case).
I updated the PR to simply pass bytes "as is" to Python's urlencode
. I agree that calling force_text
on the value is not the best option.
Is this the only motivation or is there a use case?
I do not have a use case for Django not decoding bytes, but I find it safer than decoding a content that does not need it. I prefer Django being consistent with Python. The added value of Django's urlencode
is the understanding of QueryDict, I do not expect it to re-encode the bytes I've given.
comment:9 by , 7 years ago
I updated the PR to simply pass bytes "as is" to Python's
urlencode
.
This approach makes sense to me. Thanks for explaining.
PR