Opened 2 months ago

Closed 5 weeks ago

Last modified 5 weeks ago

#36499 closed Cleanup/optimization (fixed)

strip_tags() and test_parsing_errors() fails with patched Python versions due to HTMLParser EOF behavior change

Reported by: MeggyCal Owned by: Natalia Bidart
Component: Utilities Version: 5.2
Severity: Normal Keywords:
Cc: Clifford Gama Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi, I am a packager in (open)SUSE. My colleague patched our python interpreters with their respective fixes for https://github.com/python/cpython/issues/135462 and test_strip_tags started failing with these (see bellow). As per https://github.com/python/cpython/pull/135464#discussion_r2145171001 they introduced a change in behaviour with the fix and documented it. My understanding is that tags are now left alone if they are invalid.

There is no new CPython release yet, so nothing is set in stone and I understand you might have dificulties reproducing and addressing this issue preliminary, but I just wanted to let you know.

Failure:

[  661s] ======================================================================
[  661s] FAIL: test_strip_tags (utils_tests.test_html.TestUtilsHtml.test_strip_tags) [<object object at 0xed890348>] (value
[CUT MANY &]

&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&D', output='><!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
[CUT MANY &]
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&D')
[  661s] ----------------------------------------------------------------------
[  661s] Traceback (most recent call last):
[  661s]   File "/usr/lib/python3.13/unittest/case.py", line 58, in testPartExecutor
[  661s]     yield
[  661s]   File "/usr/lib/python3.13/unittest/case.py", line 556, in subTest
[  661s]     yield
[  661s]   File "/home/abuild/rpmbuild/BUILD/python-Django-5.2.2-build/django-5.2.2/tests/utils_tests/test_html.py", line 156, in test_strip_tags
[  661s]     self.check_output(strip_tags, value, output)
[  661s]     ^^^^^^^
[  661s]   File "/home/abuild/rpmbuild/BUILD/python-Django-5.2.2-build/django-5.2.2/tests/utils_tests/test_html.py", line 34, in check_output
[  661s]     self.assertEqual(function(value), output)
[  661s]     ^^^^^^^^^^^^^^^
[  661s]   File "/usr/lib/python3.13/unittest/case.py", line 907, in assertEqual
[  661s]     assertion_func(first, second, msg=msg)
[  661s]     ^^^^^^^^^^^^^^^
[  661s]   File "/usr/lib/python3.13/unittest/case.py", line 1273, in assertMultiLineEqual
[  661s]     self.fail(self._formatMessage(msg, standardMsg))
[  661s]     ^^^^^^^^^^^
[  661s]   File "/usr/lib/python3.13/unittest/case.py", line 732, in fail
[  661s]     raise self.failureException(msg)
[  661s]     ^^^^^^^^^^^^^^^
[  661s] AssertionError: '>' != '><!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&[15958 chars]&&&D'
[  661s] Diff is 16012 characters long. Set self.maxDiff to None to see it.
[  661s] 
[  661s] ======================================================================
[  661s] FAIL: test_strip_tags (utils_tests.test_html.TestUtilsHtml.test_strip_tags) [<object object at 0xed890348>] (value='><a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<aa', output='><a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<aa')
[  661s] ----------------------------------------------------------------------
[  661s] Traceback (most recent call last):
[  661s]   File "/usr/lib/python3.13/unittest/case.py", line 58, in testPartExecutor
[  661s]     yield
[  661s]   File "/usr/lib/python3.13/unittest/case.py", line 556, in subTest
[  661s]     yield
[  661s]   File "/home/abuild/rpmbuild/BUILD/python-Django-5.2.2-build/django-5.2.2/tests/utils_tests/test_html.py", line 156, in test_strip_tags
[  661s]     self.check_output(strip_tags, value, output)
[  661s]     ^^^^^^^
[  661s]   File "/home/abuild/rpmbuild/BUILD/python-Django-5.2.2-build/django-5.2.2/tests/utils_tests/test_html.py", line 34, in check_output
[  661s]     self.assertEqual(function(value), output)
[  661s]     ^^^^^^^^^^^^^^^
[  661s]   File "/usr/lib/python3.13/unittest/case.py", line 907, in assertEqual
[  661s]     assertion_func(first, second, msg=msg)
[  661s]     ^^^^^^^^^^^^^^^
[  661s]   File "/usr/lib/python3.13/unittest/case.py", line 1273, in assertMultiLineEqual
[  661s]     self.fail(self._formatMessage(msg, standardMsg))
[  661s]     ^^^^^^^^^^^
[  661s]   File "/usr/lib/python3.13/unittest/case.py", line 732, in fail
[  661s]     raise self.failureException(msg)
[  661s]     ^^^^^^^^^^^^^^^
[  661s] AssertionError: '>' != '><a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<a<[956 chars]a<aa'
[  661s] Diff is 1010 characters long. Set self.maxDiff to None to see it.
[  661s] 
[  661s] ----------------------------------------------------------------------
[  661s] Ran 17447 tests in 178.560s

Change History (16)

comment:1 by MeggyCal, 2 months ago

Sorry, as I look at the test data alone, something ate almost all the >s, which doesn't look intentional. I have to check the patches... Edit: at a glance our patches do not differ from the upstream ones.

Last edited 2 months ago by MeggyCal (previous) (diff)

comment:2 by Clifford Gama, 2 months ago

Cc: Clifford Gama added
Component: UncategorizedUtilities
Summary: CPython might have introduced a change of behaviour in their fix for https://github.com/python/cpython/issues/135462strip_tags() fails with patched Python versions due to HTMLParser EOF behavior change
Triage Stage: UnreviewedAccepted

Thanks for the report! I managed to reproduce on against the main python e18829a8 branch. Since the commit (gh-135462) was backported to Python versions currently supported by Django, I think we can accept this on the basis that Django needs to make a decision. I think the issue is that an unterminated tag is now being discarded. In the case of the failing tests these are "<a<a..." and "<&&&...&D" and the first "<sc" in "<sc<!-- -->ript>test<<!-- -->/script>".

I see two ways we may handle this:

  1. Adjust strip_tags() to preserve pre-3.13 behavior, ensuring consistency, or
  2. Update tests, and possibly note the behavioral shift in docs, although the latter may not be necessary as the changed behaviour was not documented.

(FWIW, the associated issue that introduced the commit in Python was marked is a security issue.)

Version 0, edited 2 months ago by Clifford Gama (next)

comment:3 by Natalia Bidart, 2 months ago

Owner: set to Natalia Bidart
Severity: NormalRelease blocker
Status: newassigned

We are also seeing the failures in our scheduled tests CI but only when using Python 3.14 (example). I have also reproduced locally with Python 3.14 beta 4.

The changes in Python were driven by a security report started by the Django Security Team, following up some private reports we got. I think we need to update the tests and stick as much as possible to the Python's HTMLParser behavior. Also, we need to backport this to the supported stable branches, so I'll mark it as release blocker.

comment:4 by Natalia Bidart, 2 months ago

Has patch: set
Needs documentation: set

comment:5 by Natalia Bidart, 2 months ago

Needs documentation: unset
Patch needs improvement: set
Severity: Release blockerNormal
Type: BugCleanup/optimization

I've discussed this issue with Sarah and she made the valid point that since this affects tests only, it shouldn't require release notes nor the "Release Blocker" status. Updating!

Setting as "patch needs improvement" to block the PR until the Python versions are released.

comment:6 by Sarah Boyce, 2 months ago

Summary: strip_tags() fails with patched Python versions due to HTMLParser EOF behavior changestrip_tags() and test_parsing_errors() fails with patched Python versions due to HTMLParser EOF behavior change

comment:7 by Michał Górny, 6 weeks ago

This is now released in CPython 3.13.6, and it has been backported back as far as to 3.9 (not released upstream yet, but at least some distributions have already backported it).

in reply to:  7 comment:8 by Natalia Bidart, 6 weeks ago

Replying to Michał Górny:

This is now released in CPython 3.13.6, and it has been backported back as far as to 3.9 (not released upstream yet, but at least some distributions have already backported it).

Thank you Michał! We are tracking Python releases and as soon as every version is released upstream (3.13.6, 3.12.12, 3.11.14, 3.10.19 and 3.9.24), we'll update our CI workers and land my PR.

comment:9 by Natalia Bidart, 5 weeks ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

Code has been adjusted to work with versions of Python with and without the fix. I'll set a reminder to clean the code up once all the Pythons are released and available in out CI/CD.

comment:10 by nessita <124304+nessita@…>, 5 weeks ago

Resolution: fixed
Status: assignedclosed

In 29806275:

Fixed #36499 -- Adjusted utils_tests.test_html.TestUtilsHtml.test_strip_tags following Python's HTMLParser new behavior.

Python fixed a quadratic complexity processing for HTMLParser in:
https://github.com/python/cpython/commit/6eb6c5db.

comment:11 by nessita <124304+nessita@…>, 5 weeks ago

In 74fafe2:

[5.2.x] Fixed test_utils.tests.HTMLEqualTests.test_parsing_errors following Python's HTMLParser fixed parsing.

Further details about Python changes can be found in:
https://github.com/python/cpython/commit/0243f97cbadec8d985e63b1daec5d1cbc850cae3.

Refs #36499. Thank you Clifford Gama for the thorough review!

Backport of e4515dad7a6d953c0bd2414127ba36e1446ff41a from main.

comment:12 by nessita <124304+nessita@…>, 5 weeks ago

In 9a720d5c:

[5.2.x] Fixed #36499 -- Adjusted utils_tests.test_html.TestUtilsHtml.test_strip_tags following Python's HTMLParser new behavior.

Python fixed a quadratic complexity processing for HTMLParser in:
https://github.com/python/cpython/commit/6eb6c5db.

Backport of 2980627502c84a9fd09272e1349dc574a2ff1fb1 from main.

comment:13 by nessita <124304+nessita@…>, 5 weeks ago

In 19e7b95:

[5.1.x] Fixed test_utils.tests.HTMLEqualTests.test_parsing_errors following Python's HTMLParser fixed parsing.

Further details about Python changes can be found in:
https://github.com/python/cpython/commit/0243f97cbadec8d985e63b1daec5d1cbc850cae3.

Refs #36499. Thank you Clifford Gama for the thorough review!

Backport of e4515dad7a6d953c0bd2414127ba36e1446ff41a from main.

comment:14 by nessita <124304+nessita@…>, 5 weeks ago

In 0980178:

[5.1.x] Fixed #36499 -- Adjusted utils_tests.test_html.TestUtilsHtml.test_strip_tags following Python's HTMLParser new behavior.

Python fixed a quadratic complexity processing for HTMLParser in:
https://github.com/python/cpython/commit/6eb6c5db.

Backport of 2980627502c84a9fd09272e1349dc574a2ff1fb1 from main.

comment:15 by nessita <124304+nessita@…>, 5 weeks ago

In 2a79837:

[4.2.x] Fixed test_utils.tests.HTMLEqualTests.test_parsing_errors following Python's HTMLParser fixed parsing.

Further details about Python changes can be found in:
https://github.com/python/cpython/commit/0243f97cbadec8d985e63b1daec5d1cbc850cae3.

Refs #36499. Thank you Clifford Gama for the thorough review!

Backport of e4515dad7a6d953c0bd2414127ba36e1446ff41a from main.

comment:16 by nessita <124304+nessita@…>, 5 weeks ago

In c3f98718:

[4.2.x] Fixed #36499 -- Adjusted utils_tests.test_html.TestUtilsHtml.test_strip_tags following Python's HTMLParser new behavior.

Python fixed a quadratic complexity processing for HTMLParser in:
https://github.com/python/cpython/commit/6eb6c5db.

Backport of 2980627502c84a9fd09272e1349dc574a2ff1fb1 from main.

Note: See TracTickets for help on using tickets.
Back to Top