Opened 4 years ago

Closed 4 years ago

Last modified 5 months ago

#18239 closed Bug (fixed)

Only use custom subclass of HTMLParser for Python versions with buggy stdlib HTMLParser

Reported by: carljm Owned by: nobody
Component: Core (Other) Version: 1.3
Severity: Release blocker Keywords:
Cc: rhertzog Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


Django currently has its own subclass of HTMLParser (in django.utils.html_parser.HTMLParser). It exists in order to patch a bug in the standard library's HTMLParser in Python 2.5 and older versions of 2.6 and 2.7. The bug has been fixed in Python 2.6.8, 2.7.3, and will be fixed in the upcoming 3.3 as well. There are also other fixes in 3.3's HTMLParser which conflict with the patched version in Django, since it relies on numerous undocumented internals.

For better forward-compatibility, we should only use our patched subclass for versions of Python known to contain the bug, and otherwise simply use the standard library's HTMLParser directly.

When we make this change, we can also roll back r17456, as that was simply papering over a breakage due to the modified HTMLParser in 2.6.8 and 2.7.3 - that will no longer be a problem if we don't try to use our subclass with those (and newer) Pythons.

Attachments (1)

01_use_stdlib_htmlparser_when_possible.diff (8.9 KB) - added by rhertzog 4 years ago.
Patch for Django 1.4.1

Download all attachments as: .zip

Change History (10)

comment:1 Changed 4 years ago by carljm

(Thanks to Vinay Sajip for discovering and raising this issue.)

comment:2 Changed 4 years ago by rhertzog

  • Cc rhertzog added

For me the test suite of Django 1.4.1 fails with many invalid HTML parse errors when I run it in Debian Sid with python 2.7.3. Is this bug the same issue?

Example of error:

ERROR: test_count (regressiontests.test_utils.tests.HTMLEqualTests)
Traceback (most recent call last):
  File "/«PKGBUILDDIR»/tests/regressiontests/test_utils/", line 396, in test_count
    dom2 = parse_html('<p class="bar">foo</p>')
  File "/«PKGBUILDDIR»/django/test/", line 213, in parse_html
  File "/usr/lib/python2.7/", line 114, in feed
  File "/usr/lib/python2.7/", line 160, in goahead
    k = self.parse_endtag(i)
  File "/«PKGBUILDDIR»/django/utils/", line 96, in parse_endtag
  File "/«PKGBUILDDIR»/django/test/", line 191, in handle_endtag
    tag, self.format_position()))
  File "/«PKGBUILDDIR»/django/test/", line 153, in error
    raise HTMLParseError(msg, self.getpos())
HTMLParseError: Unexpected end tag `p` (Line 1, Column 18), at line 1, column 19

Changed 4 years ago by rhertzog

Patch for Django 1.4.1

comment:3 Changed 4 years ago by rhertzog

  • Has patch set
  • Severity changed from Normal to Release blocker

Here's a patch that seems to solve the issue for me by doing what the bug description suggest, i.e. use Django's own HTMLParser only with python versions that have the problem. It should be straightforward to adapt it for the development version.

I took the liberty to increase the severity as Django is effectively broken for me on Debian Sid right now.

comment:4 Changed 4 years ago by claudep

Python 3.2.3 has the fix also.

comment:5 Changed 4 years ago by rhertzog

I would appreciate some ack/review of a core developer before I upload this patch to debian... but it would be even better if I could just cherry pick the definitive fix from the trunk.

comment:6 Changed 4 years ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from new to closed

In [5c79dd586534bc88ce7dc81c2d781c772d28b121]:

Fixed #18239 -- Subclassed HTMLParser only for selected Python versions

Only Python versions affected by
should patch HTMLParser.
Thanks Raphaël Hertzog for the initial patch (for 1.4).

comment:7 Changed 4 years ago by Claude Paroz <claude@…>

In [57d9ccc4aaef0420f6ba60a26e6af4e83b803ae9]:

[1.4.x] Fixed #18239 -- Subclassed HTMLParser only for selected Python versions

Only Python versions affected by
should patch HTMLParser.

comment:8 Changed 4 years ago by claudep

Applied to all Python 2.6 in [fcec904e4f3582a45d4d8e309e71e9f0c4d79a0c]

comment:9 Changed 5 months ago by Tim Graham <timograham@…>

In 2c125bde:

Refs #18239 -- Removed an obsolete workaround for bugs in HTMLParser.

Note: See TracTickets for help on using tickets.
Back to Top