#18239 closed Bug (fixed)
Only use custom subclass of HTMLParser for Python versions with buggy stdlib HTMLParser
Reported by: | Carl Meyer | Owned by: | nobody |
---|---|---|---|
Component: | Core (Other) | Version: | 1.3 |
Severity: | Release blocker | Keywords: | |
Cc: | Raphaël Hertzog | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Django currently has its own subclass of HTMLParser
(in django.utils.html_parser.HTMLParser
). It exists in order to patch a bug in the standard library's HTMLParser
in Python 2.5 and older versions of 2.6 and 2.7. The bug has been fixed in Python 2.6.8, 2.7.3, and will be fixed in the upcoming 3.3 as well. There are also other fixes in 3.3's HTMLParser
which conflict with the patched version in Django, since it relies on numerous undocumented internals.
For better forward-compatibility, we should only use our patched subclass for versions of Python known to contain the bug, and otherwise simply use the standard library's HTMLParser
directly.
When we make this change, we can also roll back r17456, as that was simply papering over a breakage due to the modified HTMLParser
in 2.6.8 and 2.7.3 - that will no longer be a problem if we don't try to use our subclass with those (and newer) Pythons.
Attachments (1)
Change History (10)
comment:1 by , 13 years ago
comment:2 by , 12 years ago
Cc: | added |
---|
For me the test suite of Django 1.4.1 fails with many invalid HTML parse errors when I run it in Debian Sid with python 2.7.3. Is this bug the same issue?
Example of error:
====================================================================== ERROR: test_count (regressiontests.test_utils.tests.HTMLEqualTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/«PKGBUILDDIR»/tests/regressiontests/test_utils/tests.py", line 396, in test_count dom2 = parse_html('<p class="bar">foo</p>') File "/«PKGBUILDDIR»/django/test/html.py", line 213, in parse_html parser.feed(html) File "/usr/lib/python2.7/HTMLParser.py", line 114, in feed self.goahead(0) File "/usr/lib/python2.7/HTMLParser.py", line 160, in goahead k = self.parse_endtag(i) File "/«PKGBUILDDIR»/django/utils/html_parser.py", line 96, in parse_endtag self.handle_endtag(tag.lower()) File "/«PKGBUILDDIR»/django/test/html.py", line 191, in handle_endtag tag, self.format_position())) File "/«PKGBUILDDIR»/django/test/html.py", line 153, in error raise HTMLParseError(msg, self.getpos()) HTMLParseError: Unexpected end tag `p` (Line 1, Column 18), at line 1, column 19
by , 12 years ago
Attachment: | 01_use_stdlib_htmlparser_when_possible.diff added |
---|
Patch for Django 1.4.1
comment:3 by , 12 years ago
Has patch: | set |
---|---|
Severity: | Normal → Release blocker |
Here's a patch that seems to solve the issue for me by doing what the bug description suggest, i.e. use Django's own HTMLParser only with python versions that have the problem. It should be straightforward to adapt it for the development version.
I took the liberty to increase the severity as Django is effectively broken for me on Debian Sid right now.
comment:5 by , 12 years ago
I would appreciate some ack/review of a core developer before I upload this patch to debian... but it would be even better if I could just cherry pick the definitive fix from the trunk.
comment:6 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
(Thanks to Vinay Sajip for discovering and raising this issue.)