#22130 closed Cleanup/optimization (fixed)
Accelerated deprecation fix_ampersands and clean_html
Reported by: | Sasha Romijn | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Utilities | Version: | dev |
Severity: | Normal | Keywords: | nlsprint14 |
Cc: | eromijn@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I would like to propose the accelerated deprecation of the fix_ampersands
built-in template filter, because in all use cases, it either simply does not work, or stimulates the user to create security vulnerabilities.
fix_ampersands
Fix_ampersands aims to fix unescaped ampersands, by replacing them with &
. It attempts smart detection of HTML entities, so in ©
, no change is made. The detection of this is, understandably, limited, for example R&D; HRM; ...
would cause it not to 'fix' the ampersand, even though it is not an entity.
The template tag is marked with is_safe=True
, which means that the default escaping will occur of the input string was unsafe, and will not occur if the string was safe. However, this means that with an unsafe string, the fixing is ineffective, because the ampersand in the just generated entity is again escaped by the standard HTML escaping:
>>> print Template("{{ text|fix_ampersands }}").render(Context({'text': 'foo & bar'})) foo & bar
If the string is marked as safe, this issue does not occur:
>>> print Template("{{ text|fix_ampersands|safe }}").render(Context({'text': 'foo & bar'})) foo & bar
To the unsuspecting user, this may seem like the solution. However, this means only the ampersand is escaped, introducing an XSS vulnerability:
>>> print Template("{{ text|fix_ampersands|safe }}").render(Context({'text': 'foo & bar <script>alert("xss")</script>'})) foo & bar <script>alert("xss")</script>
Therefore, the fix_ampersands tag is only useful when an otherwise safe string is encountered, but which has unescaped ampersands, and does not have any unescaped ampersands that could be misread as an HTML entity. However, safe strings should never have unescaped ampersands. Therefore, the only use cases I see for this tag are broken or insecure.
clean_html
Fix_ampersands is only used in django.utils.clean_html
. This function is not documented, and appears to have been added with the initial publication of django. The use of this function is unclear, because it does numerous manipulations on HTML tags, but does not actually make it safe to include. Also, the things it cleans look like they are arbitrary selected, like 'Trim stupid HTML such as <br clear="all">'.
Proposal
Searching on GitHub, I have not actually managed to locate any usage of either clean_html
or fix_ampersands
. I did not do a very thorough search, but it seems it is quite rarely used.
Considering that clean_html
's behaviour is rather arbitrary and incomplete, and any use of fix_ampersands
is broken or insecure, I suggest to start an accelerated deprecation cycle as of 1.7. I suggest this to be accelerated because the current tag stimulates users to create potentially serious vulnerabilities in their projects. If accepted, I will be happy to write the patch.
Change History (13)
comment:1 by , 11 years ago
Cc: | added |
---|
comment:2 by , 11 years ago
comment:3 by , 11 years ago
Component: | Template system → Utilities |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Bug → Cleanup/optimization |
Version: | 1.6 → master |
I'm +1 on the idea.
fix_ampersands
and clean_html
have been in Django since the beginning and I'm guessing that they were made for a very specific use-case at World Online but I don't believe they have their place inside a generic web framework.
As an argument, here's clean_html
's docstring, explaining what it does exactly:
Clean the given HTML. Specifically, do the following: * Convert <b> and <i> to <strong> and <em>. * Encode all ampersands correctly. * Remove all "target" attributes from <a> tags. * Remove extraneous HTML, such as presentational tags that open and immediately close and <br clear="all">. * Convert hard-coded bullets into HTML unordered lists. * Remove stuff like "<p> </p>", but only if it's at the bottom of the text.
comment:4 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
As there were no objections on the mailing list, I think we can move forward on this.
As Alex clarified on the list, there does not appear to be any code on GitHub that uses clean_html, and given the functionality, I don't see any use for it at all. There was also no documentation for it. Therefore, I will simply remove clean_html. fix_ampersands
will be set up for accelerated deprecation: DeprecationWarning in 1.7, removal in 1.8.
comment:6 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:8 by , 11 years ago
Has patch: | unset |
---|---|
Resolution: | fixed |
Status: | closed → new |
The test of the new warning fails when run after template_tests
: ./runtests.py template_tests defaultfilters
See Jenkins for details.
comment:9 by , 11 years ago
Has patch: | set |
---|
Sorry about that - this is the same issue I asked about on the mailing list: it turns out the warnings filters across multiple tests interact. template_tests
sets an ignore for these DeprecationWarnings, which means defaultfilters
does not see them anymore, even though it sets filters back to default
. Even an explicit call to warnings.resetwarnings()
did not change this. I fixed this for utils_tests
by removing that specific check, hadn't realised it was an issue here too as the order for my locally run tests was different.
As Aymeric told me it was not really necessary to test for the warning actually being issued, the quick fix is: https://github.com/django/django/pull/2383
comment:10 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I'll try to figure out if I can stop template_tests from leaking warning state first.
comment:11 by , 11 years ago
I spend some time working on this and I cannot figure out what's going on. It doesn't appear to be related to warning filters. If you insert the following code in test_fix_ampersands
:
print("\n\nWARNING STATE\n") for f in warnings.filters: print(f) if f[1] and f[1].pattern: print(' ' + f[1].pattern) if f[3] and f[3].pattern: print(' ' + f[3].pattern)
you can see that the warning filters are exactly identical with or without running the template_tests
first, even though captured warnings are different.
While attempting to nail down the issue, I hit #22184, and that's more work than I'm willing to do just to test a deprecation warning that very obviously works. I don't want to waste more time on this, so I'll just merge Erik's pull request.
comment:12 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Django-developers thread started on https://groups.google.com/forum/#!topic/django-developers/UehmGnEjiSE
Also, it should be
django.utils.html.clean_html
, notdjango.utils.clean_html
.