Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#28121 closed Bug (wontfix)

force_text incorrectly handles SafeBytes under PY3

Reported by: Thomas Achtemichuk Owned by: nobody
Component: Utilities Version: 1.11
Severity: Normal Keywords:
Cc: tom@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Thomas Achtemichuk)

Under python 3 & Django 1.8.18, 1.9.13, 1.10.7, 1.11 and master, calling force_text on an instance of SafeBytes causes a str to be returned rather than an instance of SafeText.

>>> from django.utils.safestring import SafeBytes, SafeText
>>> from django.utils.encoding import force_text
>>> type(force_text(SafeText('')))
django.utils.safestring.SafeText
>>> type(force_text(SafeBytes(b'')))
str

This causes byte strings run through mark_safe and rendered in a template to be incorrectly escaped.

>>> from django.template import Template, Context
>>> from django.utils.safestring import mark_safe
>>> Template('{{ x }}').render(Context({'x': mark_safe(b'&')}))
'&'
>>> Template('{{ x }}').render(Context({'x': mark_safe('&')}))
'&'

Edit: This behavior differs from the same code run under PY2:

>>> type(force_text(SafeBytes(b'&')))
django.utils.safestring.SafeText

And disagrees with the comment in force_text:

            # Note: We use .decode() here, instead of six.text_type(s, encoding,
            # errors), so that if s is a SafeBytes, it ends up being a
            # SafeText at the end.

Attachments (4)

28121_1_8.patch (1.7 KB) - added by Thomas Achtemichuk 3 years ago.
Test and patch for 1.8
28121_1_10.patch (1.6 KB) - added by Thomas Achtemichuk 3 years ago.
Test and patch for 1.10
28121_1_11.patch (1.6 KB) - added by Thomas Achtemichuk 3 years ago.
Test and patch for 1.11
28121_master.patch (1.7 KB) - added by Thomas Achtemichuk 3 years ago.
Test and patch for master

Download all attachments as: .zip

Change History (22)

comment:1 Changed 3 years ago by Tim Graham

Could you give a use case where the current behavior causes a problem? Is it certain that the that text version of an arbitrary bytestring is also safe?

Changed 3 years ago by Thomas Achtemichuk

Attachment: 28121_1_8.patch added

Test and patch for 1.8

Changed 3 years ago by Thomas Achtemichuk

Attachment: 28121_1_10.patch added

Test and patch for 1.10

Changed 3 years ago by Thomas Achtemichuk

Attachment: 28121_1_11.patch added

Test and patch for 1.11

Changed 3 years ago by Thomas Achtemichuk

Attachment: 28121_master.patch added

Test and patch for master

comment:2 Changed 3 years ago by Thomas Achtemichuk

Added some patches against various stable branches and master. Not sure of the process for submitting PRs - is one per branch OK?

Also see that SafeBytes has been deprecated for internal use in 2.0, so perhaps best just to ignore the patch against master.

comment:3 Changed 3 years ago by Tim Graham

Resolution: wontfix
Status: newclosed

Based on the supported versions policy, the patch doesn't seem to qualify for a backport to the stable branches, so closing as wontfix since the issue isn't really applicable on master which supports Python 3 only.

comment:4 Changed 3 years ago by Thomas Achtemichuk

Tim,

This came up when bootstrapping a SPA's template with the output of DRF's JSONRenderer which produces utf-8 encoded json. Something like the following:

def app_home(request):
    return render(
        request,
        'app_base.html',
        {'init_data': mark_safe(JSONRenderer().render(SomeSerializer.data))}
    )

We're preparing to switch over to python3, and this bug has lead to a fairly extensive audit of everywhere we use mark_safe and pass values into templates.

Is it certain that the that text version of an arbitrary bytestring is also safe

If it isn't, then the way that force_text has behaved under PY2 for the last 5+ years should be examined:

>>> type(force_text(SafeBytes(b'&')))
django.utils.safestring.SafeText

comment:5 Changed 3 years ago by Thomas Achtemichuk

Resolution: wontfix
Status: closednew

Tim,

Reopening as I didn't make clear in my initial report that the behavior differs between PY3:

>>> type(force_text(SafeBytes(b'&')))
str

and PY2:

>>> type(force_text(SafeBytes(b'&')))
django.utils.safestring.SafeText

If this behavior is incorrect under PY2, let me know and I'll open another ticket to address it. But it definitely seems one of the above is incorrect.

comment:6 Changed 3 years ago by Thomas Achtemichuk

Also, there is this, fairly explicit comment in force_text (added 10 years ago) that makes me believe that the behavior under PY3 is wrong:

            # Note: We use .decode() here, instead of six.text_type(s, encoding,
            # errors), so that if s is a SafeBytes, it ends up being a
            # SafeText at the end.
Last edited 3 years ago by Thomas Achtemichuk (previous) (diff)

comment:7 Changed 3 years ago by Thomas Achtemichuk

Description: modified (diff)
Version: master1.11

comment:8 Changed 3 years ago by Tim Graham

Even so, I don't think the patch would qualify for a backport based on the supported versions policy as the behavior has existed as long as Django has supported Python 3, correct?

comment:9 Changed 3 years ago by Thomas Achtemichuk

It could be argued that this satisfies both: "Functionality bug in newly-introduced features" (the feature being PY3 support), and "Regressions from older versions of Django." Since the 1.0 release a decade ago, when variable auto-escaping was added, force_text and force_unicode before it have always passed "safe" bytestrings through as "safe" unicode strings. I guess the question is, according to the "rule of thumb" in the supported versions policy:

Had this been discovered in the lead-up to the 1.6 release (PY3 support), would the different behavior between PY2 and PY3 been a release blocker?

I'd assume that the goal of all that hard work was to have Django function the identically under PY2 and PY3, and any difference in behavior would have been a blocker. As someone who is doing the a bunch of that same hard work right now in my own codebase, that change in behavior causing a unit test (and entire app) to fail is definitely a blocker for me.

The other consideration would be: "Would changing this behavior under PY3 now break anything in existing codebases?"

To which my answer would be: If one has code that relies on auto-escaping a bytestring explicitly passed through mark_safe, and only under PY3... That's not the type of code worth supporting instead of fixing inconsistent behavior between python versions.

comment:10 Changed 3 years ago by Claude Paroz

About the initial use case: considering a HTML template should be basically text, not bytes, what about decoding your UTF-8 encoded stream before passing it to mark_safe?

comment:11 Changed 3 years ago by Thomas Achtemichuk

Claude, yes, that's what I've done to work around this.

comment:12 Changed 3 years ago by Thomas Achtemichuk

Cc: tom@… added

comment:13 Changed 3 years ago by Aymeric Augustin

Hrm. I realize I have no idea what SafeBytes are.

If you don't know the charset of the document in which you're going to interpolate these bytes, you have no idea what unicode codepoints they'll map to and you cannot make any guarantees about their safety in a HTML context.

It would be tempting to say "they're in DEFAULT_CHARSET", but that's too fragile for a security-critical feature. They could still be interpolated into something in another charset.

IMO the only way to fix this is to remove SafeBytes. I can't see a way to define it in a way that makes sense from a security perspective, short of annotating them with a charset, but then we've reinvented text strings.

comment:14 Changed 3 years ago by Aymeric Augustin

In any case, the Python 3 behavior seems correct to me, the Python 2 behavior seems dubious from a security perspective.

comment:15 Changed 3 years ago by Claude Paroz

Resolution: wontfix
Status: newclosed

I completely agree with Aymeric, there is no such thing as SafeBytes. It has already almost disappeared on master anyway.
Tom, what you call a workaround is probably the right thing to do.

comment:16 Changed 3 years ago by Jon Dufresne

I agree that using SafeBytes is incorrect use here.

As the type is no longer used internally and only kept for reusable apps supporting Python2, should the class be formally deprecated with warnings and docs? If so, I don't mind doing the necessary changes.

comment:17 Changed 3 years ago by Aymeric Augustin

Yes, I think we should deprecate SafeBytes and related bits of code, if any.

comment:18 Changed 3 years ago by Tim Graham

Removing SafeBytes is included in #27753, "Cleanups when no supported version of Django supports Python 2 anymore".

Note: See TracTickets for help on using tickets.
Back to Top