#36499 closed Cleanup/optimization (fixed)
strip_tags() and test_parsing_errors() fails with patched Python versions due to HTMLParser EOF behavior change
Reported by: | MeggyCal | Owned by: | Natalia Bidart |
---|---|---|---|
Component: | Utilities | Version: | 5.2 |
Severity: | Normal | Keywords: | |
Cc: | Clifford Gama | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi, I am a packager in (open)SUSE. My colleague patched our python interpreters with their respective fixes for https://github.com/python/cpython/issues/135462 and test_strip_tags started failing with these (see bellow). As per https://github.com/python/cpython/pull/135464#discussion_r2145171001 they introduced a change in behaviour with the fix and documented it. My understanding is that tags are now left alone if they are invalid.
There is no new CPython release yet, so nothing is set in stone and I understand you might have dificulties reproducing and addressing this issue preliminary, but I just wanted to let you know.
Failure:
[ 661s] ====================================================================== [ 661s] FAIL: test_strip_tags (utils_tests.test_html.TestUtilsHtml.test_strip_tags) [<object object at 0xed890348>] (valueoutput='><!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& [CUT MANY &] &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&D') [ 661s] ---------------------------------------------------------------------- [ 661s] Traceback (most recent call last): [ 661s] File "/usr/lib/python3.13/unittest/case.py", line 58, in testPartExecutor [ 661s] yield [ 661s] File "/usr/lib/python3.13/unittest/case.py", line 556, in subTest [ 661s] yield [ 661s] File "/home/abuild/rpmbuild/BUILD/python-Django-5.2.2-build/django-5.2.2/tests/utils_tests/test_html.py", line 156, in test_strip_tags [ 661s] self.check_output(strip_tags, value, output) [ 661s] ^^^^^^^ [ 661s] File "/home/abuild/rpmbuild/BUILD/python-Django-5.2.2-build/django-5.2.2/tests/utils_tests/test_html.py", line 34, in check_output [ 661s] self.assertEqual(function(value), output) [ 661s] ^^^^^^^^^^^^^^^ [ 661s] File "/usr/lib/python3.13/unittest/case.py", line 907, in assertEqual [ 661s] assertion_func(first, second, msg=msg) [ 661s] ^^^^^^^^^^^^^^^ [ 661s] File "/usr/lib/python3.13/unittest/case.py", line 1273, in assertMultiLineEqual [ 661s] self.fail(self._formatMessage(msg, standardMsg)) [ 661s] ^^^^^^^^^^^ [ 661s] File "/usr/lib/python3.13/unittest/case.py", line 732, in fail [ 661s] raise self.failureException(msg) [ 661s] ^^^^^^^^^^^^^^^ [ 661s] AssertionError: '>' != '><!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&[15958 chars]&&&D' [ 661s] Diff is 16012 characters long. Set self.maxDiff to None to see it. [ 661s] [ 661s] ====================================================================== [ 661s] FAIL: test_strip_tags (utils_tests.test_html.TestUtilsHtml.test_strip_tags) [<object object at 0xed890348>] (value='><a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<aa', output='><a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<aa') [ 661s] ---------------------------------------------------------------------- [ 661s] Traceback (most recent call last): [ 661s] File "/usr/lib/python3.13/unittest/case.py", line 58, in testPartExecutor [ 661s] yield [ 661s] File "/usr/lib/python3.13/unittest/case.py", line 556, in subTest [ 661s] yield [ 661s] File "/home/abuild/rpmbuild/BUILD/python-Django-5.2.2-build/django-5.2.2/tests/utils_tests/test_html.py", line 156, in test_strip_tags [ 661s] self.check_output(strip_tags, value, output) [ 661s] ^^^^^^^ [ 661s] File "/home/abuild/rpmbuild/BUILD/python-Django-5.2.2-build/django-5.2.2/tests/utils_tests/test_html.py", line 34, in check_output [ 661s] self.assertEqual(function(value), output) [ 661s] ^^^^^^^^^^^^^^^ [ 661s] File "/usr/lib/python3.13/unittest/case.py", line 907, in assertEqual [ 661s] assertion_func(first, second, msg=msg) [ 661s] ^^^^^^^^^^^^^^^ [ 661s] File "/usr/lib/python3.13/unittest/case.py", line 1273, in assertMultiLineEqual [ 661s] self.fail(self._formatMessage(msg, standardMsg)) [ 661s] ^^^^^^^^^^^ [ 661s] File "/usr/lib/python3.13/unittest/case.py", line 732, in fail [ 661s] raise self.failureException(msg) [ 661s] ^^^^^^^^^^^^^^^ [ 661s] AssertionError: '>' != '><a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<[956 chars]a<aa' [ 661s] Diff is 1010 characters long. Set self.maxDiff to None to see it. [ 661s] [ 661s] ---------------------------------------------------------------------- [ 661s] Ran 17447 tests in 178.560s
Change History (16)
comment:2 by , 2 months ago
Cc: | added |
---|---|
Component: | Uncategorized → Utilities |
Summary: | CPython might have introduced a change of behaviour in their fix for https://github.com/python/cpython/issues/135462 → strip_tags() fails with patched Python versions due to HTMLParser EOF behavior change |
Triage Stage: | Unreviewed → Accepted |
Thanks for the report! I managed to reproduce on against the main python e18829a8 branch. Since the commit (gh-135462) was backported to Python versions currently supported by Django, I think we can accept this on the basis that Django needs to make a decision.
The issue is that an unterminated tag is now being discarded. In the case of the failing tests these are "<a<a..."
and "<&&&...&D"
and the first "<sc"
in "<sc<!-- -->ript>test<<!-- -->/script>"
.
I see two ways we may handle this:
- Adjust
strip_tags()
to preserve pre-3.13 behavior, ensuring consistency, or - Update tests, and possibly note the behavioral shift in docs, although the latter may not be necessary as the changed behaviour was not documented.
(FWIW, the associated issue that introduced the commit in Python was marked is a security issue.)
comment:3 by , 2 months ago
Owner: | set to |
---|---|
Severity: | Normal → Release blocker |
Status: | new → assigned |
We are also seeing the failures in our scheduled tests CI but only when using Python 3.14 (example). I have also reproduced locally with Python 3.14 beta 4.
The changes in Python were driven by a security report started by the Django Security Team, following up some private reports we got. I think we need to update the tests and stick as much as possible to the Python's HTMLParser
behavior. Also, we need to backport this to the supported stable branches, so I'll mark it as release blocker.
comment:4 by , 2 months ago
Has patch: | set |
---|---|
Needs documentation: | set |
comment:5 by , 2 months ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | set |
Severity: | Release blocker → Normal |
Type: | Bug → Cleanup/optimization |
I've discussed this issue with Sarah and she made the valid point that since this affects tests only, it shouldn't require release notes nor the "Release Blocker" status. Updating!
Setting as "patch needs improvement" to block the PR until the Python versions are released.
comment:6 by , 2 months ago
Summary: | strip_tags() fails with patched Python versions due to HTMLParser EOF behavior change → strip_tags() and test_parsing_errors() fails with patched Python versions due to HTMLParser EOF behavior change |
---|
follow-up: 8 comment:7 by , 6 weeks ago
This is now released in CPython 3.13.6, and it has been backported back as far as to 3.9 (not released upstream yet, but at least some distributions have already backported it).
comment:8 by , 6 weeks ago
Replying to Michał Górny:
This is now released in CPython 3.13.6, and it has been backported back as far as to 3.9 (not released upstream yet, but at least some distributions have already backported it).
Thank you Michał! We are tracking Python releases and as soon as every version is released upstream (3.13.6, 3.12.12, 3.11.14, 3.10.19 and 3.9.24), we'll update our CI workers and land my PR.
comment:9 by , 5 weeks ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Code has been adjusted to work with versions of Python with and without the fix. I'll set a reminder to clean the code up once all the Pythons are released and available in out CI/CD.
Sorry, as I look at the test data alone, something ate almost all the
>
s, which doesn't look intentional. I have to check the patches... Edit: at a glance our patches do not differ from the upstream ones.