Context Navigation

← Previous Ticket
Next Ticket →

#7267 closed Bug (fixed)

clean_html() bug

Reported by:	Nikolay <djangoproject@…>	Owned by:	nobody
Component:	Core (Other)	Version:	dev
Severity:	Normal	Keywords:	html
Cc:	aymeric.augustin@…, Preston Timmons	Triage Stage:	Ready for checkin
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

The following is reproducible scenario for the issue:

[0:28 /home/nik]$ python
Python 2.5.2 (r252:60911, Apr 17 2008, 13:15:05) 
[GCC 4.2.3 (Debian 4.2.3-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import django
>>> django.utils.html.clean_html("<p>* * *</p><p>.</p>")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/niksite/lib/site-python/django/utils/functional.py", line 239, in wrapper
    return func(*args, **kwargs)
  File "/home/niksite/lib/site-python/django/utils/html.py", line 158, in clean_html
    text = hard_coded_bullets_re.sub(replace_p_tags, text)
  File "/home/niksite/lib/site-python/django/utils/html.py", line 156, in replace_p_tags
    s = s.replace('<p>%s' % d, '<li>')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 3: ordinal not in range(128)
>>>

Attachments (4)

clean_html_unicode_r7601.patch (1.3 KB ) - added by George Vilches 17 years ago.: Patch against r7601 to fix crash error with clean_html.
7267.patch (2.3 KB ) - added by Aymeric Augustin 15 years ago.
7267.2.patch (2.3 KB ) - added by Aymeric Augustin 15 years ago.
7267.3.patch (2.3 KB ) - added by Aymeric Augustin 15 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 by George Vilches, 17 years ago

Has patch:	set

I've generated a patch for this ticket which prevents the crash error, although the output of your input looks strange to me.

>>> import django
>>> django.utils.html.clean_html(u"<p>* * *</p><p>.</p>")
u'<ul>\n<li> * *</li><p>.</li>\n</ul>'

However, I'm comfortable with saying that if it is invalid, that should be treated as a separate bug.

Also, the only thing that might be questionable with this patch is the change to utils/html.py, line 16:

DOTS = [u'&middot;', u'*', u'\xe2\x80\xa2', u'&#149;', u'&bull;', u'&#8226;']

The 3rd entry might not be a safe Unicode conversion. However, all the unit tests pass. Can someone comment?

by George Vilches, 17 years ago

Attachment:	clean_html_unicode_r7601.patch added

Patch against r7601 to fix crash error with clean_html.

comment:2 by Simon Greenhill, 17 years ago

Triage Stage:	Unreviewed → Ready for checkin

comment:3 by Adrian Holovaty, 17 years ago

Needs tests:	set
Triage Stage:	Ready for checkin → Accepted

Needs tests!

comment:4 by Adrian Holovaty, 17 years ago

Component:	Tools → Core framework

comment:5 by Peter Baumgartner, 15 years ago

Severity:	→ Normal
Type:	→ Bug

by Aymeric Augustin, 15 years ago

Attachment:	7267.patch added

comment:6 by Aymeric Augustin, 15 years ago

Cc:	aymeric.augustin@… added
Needs tests:	unset

comment:7 by Preston Timmons, 15 years ago

Cc:	Preston Timmons added
Easy pickings:	unset
Patch needs improvement:	set

Thanks for updating the patch. The approach looks good but when I apply it I get the following test failure:

........................................................................F.............................
======================================================================
FAIL: test_clean_html (regressiontests.utils.html.TestUtilsHtml)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/pycon2011/djangoenv/src/django/tests/regressiontests/utils/html.py", line 135, in test_clean_html
    self.check_output(f, value, output)
  File "/root/pycon2011/djangoenv/src/django/tests/regressiontests/utils/html.py", line 14, in check_output
    self.assertEqual(function(value), output)
AssertionError: u'<p>I kill<br />whitespace</p>' != u'<p>I kill whitespace</p>'

comment:8 by Aymeric Augustin, 15 years ago

Oops - new patch fixes the test failure.

comment:9 by Aymeric Augustin, 15 years ago

Patch needs improvement:	unset

by Aymeric Augustin, 15 years ago

Attachment:	7267.2.patch added

comment:10 by Preston Timmons, 15 years ago

Triage Stage:	Accepted → Ready for checkin

The new patch looks good. I verified it addresses the bug and passes tests. It applies cleanly against r16094.

comment:11 by Luke Plant, 15 years ago

Patch needs improvement:	set
Triage Stage:	Ready for checkin → Accepted

The issue regarding '\xe2\x80\xa2' does need to be addressed. This is a UTF-8 bytestring sequence for Unicode character 2022. The correct way to convert it to a unicode object is:

    '\xe2\x80\xa2'.decode('utf-8')

which results in u'\u2022', not u'\xe2\x80\xa2'

Last edited 15 years ago by Luke Plant (previous) (diff)

by Aymeric Augustin, 15 years ago

Attachment:	7267.3.patch added

comment:12 by Aymeric Augustin, 15 years ago

Patch needs improvement:	unset

This new patch addresses lukeplant's comment.

comment:13 by Luke Plant, 15 years ago

Triage Stage:	Accepted → Ready for checkin

comment:14 by Luke Plant, 15 years ago

Resolution:	→ fixed
Status:	new → closed

In [16118]:

Fixed #7267 - UnicodeDecodeError in clean_html

Thanks to Nikolay for the report, and gav and aaugustin for the patch.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#7267 closed Bug (fixed)

clean_html() bug

Description

Attachments (4)

Change History (18)

comment:1 by George Vilches, 17 years ago

by George Vilches, 17 years ago

comment:2 by Simon Greenhill, 17 years ago

comment:3 by Adrian Holovaty, 17 years ago

comment:4 by Adrian Holovaty, 17 years ago

comment:5 by Peter Baumgartner, 15 years ago

by Aymeric Augustin, 15 years ago

comment:6 by Aymeric Augustin, 15 years ago

comment:7 by Preston Timmons, 15 years ago

comment:8 by Aymeric Augustin, 15 years ago

comment:9 by Aymeric Augustin, 15 years ago

by Aymeric Augustin, 15 years ago

comment:10 by Preston Timmons, 15 years ago

comment:11 by Luke Plant, 15 years ago

by Aymeric Augustin, 15 years ago

comment:12 by Aymeric Augustin, 15 years ago

comment:13 by Luke Plant, 15 years ago

comment:14 by Luke Plant, 15 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us