#28121 closed Bug (wontfix)
force_text incorrectly handles SafeBytes under PY3
Reported by: | Thomas Achtemichuk | Owned by: | nobody |
---|---|---|---|
Component: | Utilities | Version: | 1.11 |
Severity: | Normal | Keywords: | |
Cc: | tom@… | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Under python 3 & Django 1.8.18, 1.9.13, 1.10.7, 1.11 and master, calling force_text
on an instance of SafeBytes
causes a str
to be returned rather than an instance of SafeText
.
>>> from django.utils.safestring import SafeBytes, SafeText >>> from django.utils.encoding import force_text >>> type(force_text(SafeText(''))) django.utils.safestring.SafeText >>> type(force_text(SafeBytes(b''))) str
This causes byte strings run through mark_safe
and rendered in a template to be incorrectly escaped.
>>> from django.template import Template, Context >>> from django.utils.safestring import mark_safe >>> Template('{{ x }}').render(Context({'x': mark_safe(b'&')})) '&' >>> Template('{{ x }}').render(Context({'x': mark_safe('&')})) '&'
Edit: This behavior differs from the same code run under PY2:
>>> type(force_text(SafeBytes(b'&'))) django.utils.safestring.SafeText
And disagrees with the comment in force_text:
# Note: We use .decode() here, instead of six.text_type(s, encoding, # errors), so that if s is a SafeBytes, it ends up being a # SafeText at the end.
Attachments (4)
Change History (22)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
Added some patches against various stable branches and master. Not sure of the process for submitting PRs - is one per branch OK?
Also see that SafeBytes has been deprecated for internal use in 2.0, so perhaps best just to ignore the patch against master.
comment:3 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Based on the supported versions policy, the patch doesn't seem to qualify for a backport to the stable branches, so closing as wontfix since the issue isn't really applicable on master which supports Python 3 only.
comment:4 by , 8 years ago
Tim,
This came up when bootstrapping a SPA's template with the output of DRF's JSONRenderer
which produces utf-8 encoded json. Something like the following:
def app_home(request): return render( request, 'app_base.html', {'init_data': mark_safe(JSONRenderer().render(SomeSerializer.data))} )
We're preparing to switch over to python3, and this bug has lead to a fairly extensive audit of everywhere we use mark_safe
and pass values into templates.
Is it certain that the that text version of an arbitrary bytestring is also safe
If it isn't, then the way that force_text
has behaved under PY2 for the last 5+ years should be examined:
>>> type(force_text(SafeBytes(b'&'))) django.utils.safestring.SafeText
comment:5 by , 8 years ago
Resolution: | wontfix |
---|---|
Status: | closed → new |
Tim,
Reopening as I didn't make clear in my initial report that the behavior differs between PY3:
>>> type(force_text(SafeBytes(b'&'))) str
and PY2:
>>> type(force_text(SafeBytes(b'&'))) django.utils.safestring.SafeText
If this behavior is incorrect under PY2, let me know and I'll open another ticket to address it. But it definitely seems one of the above is incorrect.
comment:6 by , 8 years ago
Also, there is this, fairly explicit comment in force_text
(added 10 years ago) that makes me believe that the behavior under PY3 is wrong:
# Note: We use .decode() here, instead of six.text_type(s, encoding, # errors), so that if s is a SafeBytes, it ends up being a # SafeText at the end.
comment:7 by , 8 years ago
Description: | modified (diff) |
---|---|
Version: | master → 1.11 |
comment:8 by , 8 years ago
Even so, I don't think the patch would qualify for a backport based on the supported versions policy as the behavior has existed as long as Django has supported Python 3, correct?
comment:9 by , 8 years ago
It could be argued that this satisfies both: "Functionality bug in newly-introduced features" (the feature being PY3 support), and "Regressions from older versions of Django." Since the 1.0 release a decade ago, when variable auto-escaping was added, force_text
and force_unicode
before it have always passed "safe" bytestrings through as "safe" unicode strings. I guess the question is, according to the "rule of thumb" in the supported versions policy:
Had this been discovered in the lead-up to the 1.6 release (PY3 support), would the different behavior between PY2 and PY3 been a release blocker?
I'd assume that the goal of all that hard work was to have Django function the identically under PY2 and PY3, and any difference in behavior would have been a blocker. As someone who is doing the a bunch of that same hard work right now in my own codebase, that change in behavior causing a unit test (and entire app) to fail is definitely a blocker for me.
The other consideration would be: "Would changing this behavior under PY3 now break anything in existing codebases?"
To which my answer would be: If one has code that relies on auto-escaping a bytestring explicitly passed through mark_safe
, and only under PY3... That's not the type of code worth supporting instead of fixing inconsistent behavior between python versions.
comment:10 by , 8 years ago
About the initial use case: considering a HTML template should be basically text, not bytes, what about decoding your UTF-8 encoded stream before passing it to mark_safe
?
comment:12 by , 8 years ago
Cc: | added |
---|
comment:13 by , 8 years ago
Hrm. I realize I have no idea what SafeBytes
are.
If you don't know the charset of the document in which you're going to interpolate these bytes, you have no idea what unicode codepoints they'll map to and you cannot make any guarantees about their safety in a HTML context.
It would be tempting to say "they're in DEFAULT_CHARSET", but that's too fragile for a security-critical feature. They could still be interpolated into something in another charset.
IMO the only way to fix this is to remove SafeBytes
. I can't see a way to define it in a way that makes sense from a security perspective, short of annotating them with a charset, but then we've reinvented text strings.
comment:14 by , 8 years ago
In any case, the Python 3 behavior seems correct to me, the Python 2 behavior seems dubious from a security perspective.
comment:15 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I completely agree with Aymeric, there is no such thing as SafeBytes
. It has already almost disappeared on master anyway.
Tom, what you call a workaround is probably the right thing to do.
comment:16 by , 8 years ago
I agree that using SafeBytes
is incorrect use here.
As the type is no longer used internally and only kept for reusable apps supporting Python2, should the class be formally deprecated with warnings and docs? If so, I don't mind doing the necessary changes.
comment:17 by , 8 years ago
Yes, I think we should deprecate SafeBytes and related bits of code, if any.
comment:18 by , 8 years ago
Removing SafeBytes
is included in #27753, "Cleanups when no supported version of Django supports Python 2 anymore".
Could you give a use case where the current behavior causes a problem? Is it certain that the that text version of an arbitrary bytestring is also safe?