Opened 4 years ago

Closed 4 years ago

#17730 closed Bug (fixed)

Change django.utils.htmlparser to django.utils.html_parser to avoid possible issues with case-insensitivity and the HTMLParser stdlib module

Reported by: un33k Owned by: nobody
Component: Testing framework Version: 1.4-beta-1
Severity: Release blocker Keywords: AttributeError
Cc: gldnspud Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Python Version: 2.6.5
OS: Ubuntu
Django==1.4b1
South==0.7.3
Unidecode==0.04.9
Unipath==0.2.1
Werkzeug==0.8.3
boto==2.1.1
django-debug-toolbar==0.9.4
django-extensions==0.7.1

===========
in utils/HTMLParser.py:
import HTMLParser as _HTMLParser (import succeeds)
As you see from below, _HTMLParser has an attribute called _HTMLParser.

dir(_HTMLParser)

['_HTMLParser', 'builtins', 'doc', 'file', 'name', 'package', 're']

However on line 11 of utils/HTMLParser.py, it tries to reference HTMParser without the underscore.
_HTMLParser.HTMLParser.init(self)

When I changed line 11 from the above to the following, it worked.
_HTMLParser._HTMLParser.init(self)

Image attached.

Attachments (3)

HTMLParser.png (87.9 KB) - added by un33k 4 years ago.
Debugger dump
ticket17730.diff (8.4 KB) - added by gldnspud 4 years ago.
17730-alternative.patch (832 bytes) - added by aaugustin 4 years ago.

Download all attachments as: .zip

Change History (17)

Changed 4 years ago by un33k

Debugger dump

comment:1 Changed 4 years ago by carljm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to worksforme
  • Status changed from new to closed

There is no such file django/utils/HTMLParser.py in the Django source tree or in the 1.4b1 source distribution, only django/utils/htmlparser.py. In your attached image, it appears that somehow you have both files. It seems that your django/utils/HTMLParser.py is shadowing the standard library HTMLParser module, and this is causing the import in htmlparser.py to import your stray HTMLParser.py rather than the real HTMLParser standard library module.

I have no idea how you came to have a django/utils/HTMLParser.py file, but it should be removed, and then I'd expect things to work normally.

comment:2 Changed 4 years ago by un33k

I have verified and there is no HTMLParser.py in django/utils.
Also verified that django-debug-toolbar was causing it. When I removed it, everything works fine.
Not sure why the stack trace is a bit misleading.
Will investigate further.
Thx

comment:3 follow-up: Changed 4 years ago by gldnspud

  • Resolution worksforme deleted
  • Status changed from closed to reopened

Consider renaming django.utils.htmlparser to a different name, e.g. django.utils.html_parser.

This is very likely due to some platforms using case-insensitive filesystems. It is not limited to the use of django-debug-toolbar.

OSX is one example. When I try import django.utils.htmlparser within OSX, the version of Python appears to know about the distinction between the two, and handles imports properly. This is despite the fact that you can go into the .../django/utils directory and run cat HtMlPaRsEr.Py and it will happily show you the same output as cat htmlparser.py.

The way I ran into the error this ticket refers to is by using Parallels Desktop's shared folders feature to access my OSX filesystem from within an Ubuntu 10.04.3 virtual machine. The way Python is compiled in Linux does NOT know about the case insensitivity, yet the shared folders implementation exposes this -- you can go into the .../django/utils dir within the mounted filesystem in Linux, and type cat HtMlPaRsEr.Py and lo, there is the contents of the file.

All of this is unfortunate, but these are in fact realistic conditions for working on Django projects where you want your Django project to run inside a VM, but desire the use of an OSX-based editor and other tools.

comment:4 Changed 4 years ago by gldnspud

Indeed, I was able to confirm just now that doing the following allowed me to run tests and see them pass:

  1. Rename django/utils/htmlparser.py to django/utils/html_parser.py
  2. Line 7 of django/test/html.py, change from from django.utils.htmlparser import HTMLParser to from django.utils.html_parser import HTMLParser

Note: I was able to find no other references to htmlparser in the Django 1.4b1 code base (other than in EGG-INFO/SOURCES.txt which I assume to be auto-generated).

Changed 4 years ago by gldnspud

comment:6 Changed 4 years ago by gldnspud

  • Cc gldnspud added
  • Has patch set

comment:7 in reply to: ↑ 3 Changed 4 years ago by ramiro

Replying to gldnspud:

using Parallels Desktop's shared folders feature to access my OSX filesystem from within an Ubuntu 10.04.3 virtual machine. The way Python is compiled in Linux does NOT know about the case insensitivity, yet the shared folders implementation exposes this [...] All of this is unfortunate, but these are in fact realistic conditions for working on Django projects where you want your Django project to run inside a VM, but desire the use of an OSX-based editor and other tools.

I must say I disagree with the realistic conditions part. That's a very uncommon development setup and calling for the kind of trouble you are finding.

comment:8 Changed 4 years ago by aaugustin

Renaming a module is backwards incompatible, we can't do this lightly.

comment:9 Changed 4 years ago by carljm

  • Summary changed from AttributeError: 'module' object has no attribute 'HTMLParser' to Change django.utils.htmlparser to django.utils.html_parser to avoid possible issues with case-insensitivity and the HTMLParser stdlib module

I would be interested in the results of further investigation by un3kk. If it really is possible for django-debug-toolbar to trigger this problem, that would be a serious issue.

On the whole, I am tempted to make the change suggested by gldnspud. I agree that the "case-insensitive filesystem accessed by Python from case-sensitive OS" case is not exactly common, but it would be an easy issue to avoid entirely with a relatively simple change. Relying on case-sensitivity for correct operation seems like something it doesn't hurt to avoid. I'm not too concerned about the backwards incompatibility, given that django.utils.htmlparser was added to trunk less than a month ago, has never been in a production release, and is undocumented and used only by a test helper.

Tentatively accepting, but open to arguments that it would be a bad idea to make this change.

comment:10 Changed 4 years ago by carljm

  • Triage Stage changed from Unreviewed to Accepted

comment:11 Changed 4 years ago by aaugustin

I wasn't aware that this module is a recent addition. I withdraw my previous comment.

comment:12 Changed 4 years ago by SmileyChris

  • Severity changed from Normal to Release blocker

comment:13 Changed 4 years ago by aaugustin

The public API of the HTMLParser module is the HTMLParser class and the HTMLParseError exception.

If we want django.utils.htmlparser to mimic this interface, we could expose HTMLParseError there, and import it from that module.

I'm attaching a patch for this technique.

Changed 4 years ago by aaugustin

comment:14 Changed 4 years ago by carljm

  • Resolution set to fixed
  • Status changed from reopened to closed

In [17607]:

Fixed #17730 - Renamed django.utils.htmlparser to django.utils.html_parser to avoid shadowing stdlib HTMLParser in rare case-sensitivity situations. Thanks un33k for the report and gldnspud for the patch.

Note: See TracTickets for help on using tickets.
Back to Top