Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#22130 closed Cleanup/optimization (fixed)

Accelerated deprecation fix_ampersands and clean_html

Reported by: Sasha Romijn Owned by: Aymeric Augustin
Component: Utilities Version: master
Severity: Normal Keywords: nlsprint14
Cc: eromijn@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


I would like to propose the accelerated deprecation of the fix_ampersands built-in template filter, because in all use cases, it either simply does not work, or stimulates the user to create security vulnerabilities.

Fix_ampersands aims to fix unescaped ampersands, by replacing them with &. It attempts smart detection of HTML entities, so in ©, no change is made. The detection of this is, understandably, limited, for example R&D; HRM; ... would cause it not to 'fix' the ampersand, even though it is not an entity.

The template tag is marked with is_safe=True, which means that the default escaping will occur of the input string was unsafe, and will not occur if the string was safe. However, this means that with an unsafe string, the fixing is ineffective, because the ampersand in the just generated entity is again escaped by the standard HTML escaping:

>>> print Template("{{ text|fix_ampersands }}").render(Context({'text': 'foo & bar'}))
foo & bar

If the string is marked as safe, this issue does not occur:

>>> print Template("{{ text|fix_ampersands|safe }}").render(Context({'text': 'foo & bar'}))
foo & bar

To the unsuspecting user, this may seem like the solution. However, this means only the ampersand is escaped, introducing an XSS vulnerability:

>>> print Template("{{ text|fix_ampersands|safe }}").render(Context({'text': 'foo & bar <script>alert("xss")</script>'}))
foo &amp; bar <script>alert("xss")</script>

Therefore, the fix_ampersands tag is only useful when an otherwise safe string is encountered, but which has unescaped ampersands, and does not have any unescaped ampersands that could be misread as an HTML entity. However, safe strings should never have unescaped ampersands. Therefore, the only use cases I see for this tag are broken or insecure.

Fix_ampersands is only used in django.utils.clean_html. This function is not documented, and appears to have been added with the initial publication of django. The use of this function is unclear, because it does numerous manipulations on HTML tags, but does not actually make it safe to include. Also, the things it cleans look like they are arbitrary selected, like 'Trim stupid HTML such as <br clear="all">'.

Searching on GitHub, I have not actually managed to locate any usage of either clean_html or fix_ampersands. I did not do a very thorough search, but it seems it is quite rarely used.

Considering that clean_html's behaviour is rather arbitrary and incomplete, and any use of fix_ampersands is broken or insecure, I suggest to start an accelerated deprecation cycle as of 1.7. I suggest this to be accelerated because the current tag stimulates users to create potentially serious vulnerabilities in their projects. If accepted, I will be happy to write the patch.

Change History (13)

comment:1 Changed 5 years ago by Sasha Romijn

Cc: eromijn@… added

comment:2 Changed 5 years ago by Sasha Romijn

Django-developers thread started on!topic/django-developers/UehmGnEjiSE

Also, it should be django.utils.html.clean_html, not django.utils.clean_html.

comment:3 Changed 5 years ago by Baptiste Mispelon

Component: Template systemUtilities
Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization
Version: 1.6master

I'm +1 on the idea.

fix_ampersands and clean_html have been in Django since the beginning and I'm guessing that they were made for a very specific use-case at World Online but I don't believe they have their place inside a generic web framework.

As an argument, here's clean_html's docstring, explaining what it does exactly:

Clean the given HTML.  Specifically, do the following:
    * Convert <b> and <i> to <strong> and <em>.
    * Encode all ampersands correctly.
    * Remove all "target" attributes from <a> tags.
    * Remove extraneous HTML, such as presentational tags that open and
      immediately close and <br clear="all">.
    * Convert hard-coded bullets into HTML unordered lists.
    * Remove stuff like "<p>&nbsp;&nbsp;</p>", but only if it's at the
      bottom of the text.

comment:4 Changed 5 years ago by Sasha Romijn

Owner: changed from nobody to Sasha Romijn
Status: newassigned

As there were no objections on the mailing list, I think we can move forward on this.

As Alex clarified on the list, there does not appear to be any code on GitHub that uses clean_html, and given the functionality, I don't see any use for it at all. There was also no documentation for it. Therefore, I will simply remove clean_html. fix_ampersands will be set up for accelerated deprecation: DeprecationWarning in 1.7, removal in 1.8.

comment:5 Changed 5 years ago by Sasha Romijn

Has patch: set

comment:6 Changed 5 years ago by Erik Romijn <eromijn@…>

Resolution: fixed
Status: assignedclosed

In 775975f15d7d461c154e558cba5fb0592539126f:

Fixed #22130 -- Deprecated fix_ampersands, removed utils.clean_html()

comment:7 Changed 5 years ago by Aymeric Augustin <aymeric.augustin@…>

In 3273bd7b254680a5b241e2fdbc3196956b2b44e8:

Merge pull request #2382 from erikr/deprecate-fix-ampersands

Fixed #22130 -- Deprecated fix_ampersands and clean_html.

comment:8 Changed 5 years ago by Tim Graham

Has patch: unset
Resolution: fixed
Status: closednew

The test of the new warning fails when run after template_tests: ./ template_tests defaultfilters

See Jenkins for details.

comment:9 Changed 5 years ago by Sasha Romijn

Has patch: set

Sorry about that - this is the same issue I asked about on the mailing list: it turns out the warnings filters across multiple tests interact. template_tests sets an ignore for these DeprecationWarnings, which means defaultfilters does not see them anymore, even though it sets filters back to default. Even an explicit call to warnings.resetwarnings() did not change this. I fixed this for utils_tests by removing that specific check, hadn't realised it was an issue here too as the order for my locally run tests was different.

As Aymeric told me it was not really necessary to test for the warning actually being issued, the quick fix is:

comment:10 Changed 5 years ago by Aymeric Augustin

Owner: changed from Sasha Romijn to Aymeric Augustin
Status: newassigned

I'll try to figure out if I can stop template_tests from leaking warning state first.

comment:11 Changed 5 years ago by Aymeric Augustin

I spend some time working on this and I cannot figure out what's going on. It doesn't appear to be related to warning filters. If you insert the following code in test_fix_ampersands:

            print("\n\nWARNING STATE\n")
            for f in warnings.filters:
                if f[1] and f[1].pattern: print('    ' + f[1].pattern)
                if f[3] and f[3].pattern: print('    ' + f[3].pattern)

you can see that the warning filters are exactly identical with or without running the template_tests first, even though captured warnings are different.

While attempting to nail down the issue, I hit #22184, and that's more work than I'm willing to do just to test a deprecation warning that very obviously works. I don't want to waste more time on this, so I'll just merge Erik's pull request.

comment:12 Changed 5 years ago by Erik Romijn <eromijn@…>

Resolution: fixed
Status: assignedclosed

In 6268792e19ab37446e438f4777c0bb8e786c5c3f:

Fixed #22130 -- fixed template_tests/defaultfilters order dependent test failure

comment:13 Changed 5 years ago by Aymeric Augustin <aymeric.augustin@…>

In a08f906556c14357852bbc9c4c7d10c6558dafcf:

Merge pull request #2383 from erikr/fix-tests-fix-ampersands-deprecation

Fixed #22130 -- fixed template_tests/defaultfilters order dependent test failure

Note: See TracTickets for help on using tickets.
Back to Top