Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#24001 closed Cleanup/optimization (needsinfo)

Add a regression test for strip_tags, html encoding and unicode MemoryError

Reported by: twig Owned by: mhall1
Component: Template system Version: dev
Severity: Normal Keywords:
Cc: mhall1 Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

We noticed some processes were using up to 22gb of memory and throwing MemoryError exceptions.

Here's some sample code in a django/python shell:

from django.template.defaultfilters import striptags

value = """<p class="storybody"><h2>Images and Text Do Not Mix</h2><br><br>This PowerPoint <a href="http://www.slideshare.net/anilkr123/car-and-technology" target="_blank">presentation on cars</a> (we know it\u2019s about cars because an introductory slide consists of the word "CARS" in huge, garish orange-and-blue letters) puts all of its images in the background (after applying a little tasteful fading), with <a href="http://www.pcworld.com/article/7774/make_a_bold_statement_with_text_in_powerpoint.html" target="_blank">paragraphs of text</a> overlaid on them. This accomplishes the difficult feat of making the images hard to look at <i>and</i> the text hard to read. Perfect&#8212a lose-lose situation! <br><br>The presenter could have consolidated the text in one part of the image, using the image\u2019s horizontal guiding lines; but that didn\u2019t happen, so the slide manages to look sloppy as well as unreadable. Bonus points for misspelling \u201ccarburetor.\u201d</p>"""

striptags(value)

Removing the "&#8212" after "Perfect" fixes the problem. The character is the long-dash, most likely copy pasted from Microsoft Word.

Tested with v1.6.8 and v1.7.1

Change History (7)

comment:1 by mhall1, 10 years ago

I've verified that the problem is fixed on 1.6.9 alpha and 1.7.2 alpha.

Version 0, edited 10 years ago by mhall1 (next)

comment:2 by mhall1, 10 years ago

I have a regression test in the works to make sure this doesn't come up again. Assigning to myself for now.

comment:3 by mhall1, 10 years ago

Cc: mhall1 added
Owner: changed from nobody to mhall1
Status: newassigned
Triage Stage: UnreviewedAccepted

comment:4 by Tim Graham, 10 years ago

Component: UncategorizedTemplate system
Summary: strip_tags, html encoding and unicode usage causes MemoryError on short stringAdd a regression test for strip_tags, html encoding and unicode MemoryError
Type: BugCleanup/optimization
Version: 1.6master

comment:5 by mhall1, 10 years ago

I tried to reproduce this again on 1.6.8, 1.6.9 alpha, 1.7.1, and 1.7.2 alpha just to be sure, and I haven't had any success. The unit test needed to check for this is a bit resource-intensive so I'd like to pin down the issue first.

@twig, if you could provide any other info such as python version, database backend, etc. I'd really appreciate it. I'm moving this to "needsinfo" for now.

comment:6 by mhall1, 10 years ago

Resolution: needsinfo
Status: assignedclosed

in reply to:  5 comment:7 by ttyS15, 10 years ago

Replying to mhall1:

I tried to reproduce this again on 1.6.8, 1.6.9 alpha, 1.7.1, and 1.7.2 alpha just to be sure, and I haven't had any success. The unit test needed to check for this is a bit resource-intensive so I'd like to pin down the issue first.

@twig, if you could provide any other info such as python version, database backend, etc. I'd really appreciate it. I'm moving this to "needsinfo" for now.

This is bug in Python<=2.7.8 http://bugs.python.org/issue20288. Fixed in 2.7.9 and higher.

Note: See TracTickets for help on using tickets.
Back to Top