Code

Opened 8 years ago

Last modified 6 months ago

#2594 new Bug

Template system should handle whitespace better

Reported by: Gary Wilson <gary.wilson@…> Owned by: jshedd
Component: Template system Version: master
Severity: Normal Keywords:
Cc: mccutchen@…, gary.wilson@…, cgrady@…, mrmachine, kmike84@…, andrep, Twidi, Florian.Sening@…, waldir, ldiqual@…, python@…, ua_django_bugzilla@…, s.shanabrook@…, chipx86@…, tomasbabej@…, trbs@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

So, I think that the template system should not leave blank lines
behind if the tag(s) on that line evaulated to the empty string. To
make sure we are all on the same page, here is an example:

This template code:

<h1>My list</h1>

<ul>
{% for item in items %}
    <li>{{ item }}</li>
{% endfor %}
</ul>

Would normally evaluate to:

<h1>My list</h1>

<ul>

    <li>item 1</li>

    <li>item 2</li>

    <li>item 3</li>

</ul>

And with the {% spaceless %} tag evaluates to:

<h1>My list</h1> <ul> <li>item 1</li> <li>item 2</li> <li>item 3</li>
</ul>

And with the StripWhitespaceMiddleware evaluates to

<h1>My list</h1>
<ul>
    <li>item 1</li>
    <li>item 2</li>
    <li>item 3</li>
</ul>

But what I think should evaluate to:

<h1>My list</h1>

<ul>
    <li>item 1</li>
    <li>item 2</li>
    <li>item 3</li>
</ul>

In other words, leave lines that I add, but remove the empty lines that
had template tags on them.

Attachments (13)

template_whitespace-2006-08-22.diff (5.3 KB) - added by Gary Wilson <gary.wilson@…> 8 years ago.
template_whitespace.patch (11.9 KB) - added by SmileyChris 8 years ago.
Alternate version
template_whitespace_2.patch (10.8 KB) - added by SmileyChris 7 years ago.
Optimized version of my alternate patch
template_whitespace_2.2.patch (10.8 KB) - added by SmileyChris 7 years ago.
oops, this one i meant :P
template_whitespace_ticket_2594.diff (3.3 KB) - added by jshedd 4 years ago.
Causes template tag blocks to collapse the whitespace around them when they are on their own line.
template_whitespace_ticket_2594_tests.diff (2.1 KB) - added by jshedd 4 years ago.
Tests for whitespace collapsing.
template_whitespace_ticket_2594_with_tests.diff (5.5 KB) - added by jshedd 4 years ago.
Everything in one diff.
whitespace_benchmarks.zip (2.4 KB) - added by jshedd 4 years ago.
Benchmarks from unladen-swallow for the template system.
2594.diff (14.2 KB) - added by robhudson 3 years ago.
Patched against trunk (and template shuffling), removed trailing white space, and adjusted to add comments and make pass with steveire's tests.
whitespace.patch (24.8 KB) - added by steveire 3 years ago.
0001-Commiting-whitespace-patch-from-django-ticket-2594.patch (53.0 KB) - added by fahhem 3 years ago.
Modifies DebugLexer too, not just Lexer
0001-Commiting-whitespace-patch-from-django-ticket-2594.2.patch (53.0 KB) - added by fahhem 3 years ago.
Same as previous, with fixed whitespace issue
ticket-2594-template-strip-leading-whitespace-2012-06-28.diff (26.8 KB) - added by leonov 22 months ago.
Update patch against 1.5-dev

Download all attachments as: .zip

Change History (68)

Changed 8 years ago by Gary Wilson <gary.wilson@…>

comment:2 Changed 8 years ago by mccutchen@…

  • Cc mccutchen@… added
  • Summary changed from template system should handle whitespace better to [patch] template system should handle whitespace better

(just indicating that this ticket has a patch)

comment:3 Changed 8 years ago by anonymous

  • Cc gary.wilson@… added

comment:4 Changed 8 years ago by Gary Wilson <gary.wilson@…>

Although, this patch does not work as desired in all cases. See the google group link above.

comment:5 Changed 8 years ago by anonymous

  • Summary changed from [patch] template system should handle whitespace better to Template system should handle whitespace better

The speed issue is the killer here. We can't have a 100% performance hit on template rendering times -- speed in that area is one of Django's selling points. I'm going to remove the "patch" keyword for now, Gary, since I don't think we can take this patch. Leaving the ticket open, though, since the idea is a good one and if we can fix it, we should.

I'll try to work up something a bit faster; I had something that was almost right a while back, how hard can it be? :-(

comment:6 Changed 8 years ago by Gary Wilson <gary.wilson@…>

Here are some performance measures I got when I tested a few weeks ago:

Using the command:
$ python timeit.py -n 50 -s "from test.views import my_render" "my_render()"

without patch (r3735):
50 loops, best of 3: 6.34 msec per loop
50 loops, best of 3: 6.23 msec per loop
50 loops, best of 3: 6.24 msec per loop

with patch:
50 loops, best of 3: 8.67 msec per loop
50 loops, best of 3: 8.46 msec per loop
50 loops, best of 3: 8.46 msec per loop

So about 33-39% slower. And it doesn't work in all cases. :(

The test code looked like:

from django.template import Context, Template

def my_render():
    s = """\
<html>
<body>
    <div id="content">
    {% if items %}
    <ul>
        <li>Single item:</li>
        <li>
        {{ item.0.name }}
        </li>
    {% for item in items %}
    
        {% if item.name %}
        <li>{{ item.name }}</li>
        {% endif %}
        {% if not item.name %}
        <li>No name item</li>
        {% endif %}
        {% for item in fooitems %}
        <li>{{ item.name }}</li>
        {% endfor %}
        {% if item.des %}
        <li>{{ item.des }}</li>
        {% endif %}
    {% endfor %}
    {% if items %}
        <li>Spaceless items to follow:</li>
        {% spaceless %}
        {% for item in items %}
        
        <li>{{ item.name }}</li>
        {% endfor %}
        {% endspaceless %}
    {% endif %}
    </ul>
    {% endif %}
    {% if items %}
    <ul>
    {% for item in items %}
        {{ item.name }}
    {% endfor %}
    {% for foo in foos %}
    
        {{ foo }}
    {% endfor %}
    {% for item in items %}
        <li>huh{{ foo }}</li>
    {% endfor %}
    </ul>
    {% endif %}
    {% if items %}<p>We have items.</p>{% endif %}
    {% if items %}
    <p>We have items.</p>
    {% endif %}
    {% if foo %}<p>We have foo.</p>{% endif %}
    {% if foo %}
    <p>We have foo.</p>
    {% endif %}
    </div>
</body>
</html>
"""
    class Item:
        pass
    items = []
    for x in range(4):
        item = Item()
        item.name = 'item' + str(x)
        items.append(item)
    t = Template(s)
    c = Context({'items': items})
    return t.render(c)

Changed 8 years ago by SmileyChris

Alternate version

comment:7 Changed 8 years ago by SmileyChris

I've had a long ongoing discussion with Gary and came up with a completely different version to solve the same problem (actually, a couple of different versions, but this one is the best).

We did some profiling of this one too, and it turned out to be about 50% slower. If someone wants to have a look at optimization, I'd be most appreciative.

comment:8 Changed 8 years ago by SmileyChris

I'm happy to talk about my patch with anyone interested - smileychris at gmail.

How it works:

  1. When a tag is found on its own line, the lexer adds a token before and after containing the relevant whitespace to hold on to.
  2. The parser turns these into nodes.
  3. The nodelist works out at render time whether or not the whitespace nodes should be used.
    • If an EndNode is reached before a StartNode is found, the EndNode is discarded.
    • If an EndNode is reached and the previous rendered block ended in a new line, both nodes are discarded.
    • Otherwise, the whitespace nodes are rendered.

comment:9 Changed 8 years ago by mtredinnick

  • Owner changed from adrian to mtredinnick

Changed 7 years ago by SmileyChris

Optimized version of my alternate patch

comment:10 Changed 7 years ago by SmileyChris

This one doesn't cause anywhere near as huge of a performance hit as the last one. And it's patched versus some recent changes.

Changed 7 years ago by SmileyChris

oops, this one i meant :P

comment:11 Changed 7 years ago by Simon G. <dev@…>

  • Has patch set
  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Design decision needed

comment:12 Changed 7 years ago by Collin Grady <cgrady@…>

  • Cc cgrady@… added

+1 to cleaner template output :)

comment:13 Changed 6 years ago by jacob

  • Triage Stage changed from Design decision needed to Someday/Maybe

I like the idea, but there's no way we can make the template engine 50% slower for this.

comment:14 Changed 6 years ago by korpios

  • Cc korpios@… added

comment:15 Changed 6 years ago by korpios

  • Cc korpios@… removed

comment:16 Changed 4 years ago by jshedd

  • Owner changed from nobody to jshedd

Changed 4 years ago by jshedd

Causes template tag blocks to collapse the whitespace around them when they are on their own line.

Changed 4 years ago by jshedd

Tests for whitespace collapsing.

Changed 4 years ago by jshedd

Everything in one diff.

Changed 4 years ago by jshedd

Benchmarks from unladen-swallow for the template system.

comment:17 Changed 4 years ago by jshedd

  • Patch needs improvement unset

For the patch, I ran a copy of unladen-swallow's django template benchmark, attached as a zip to the ticket.

RunDjango 1.1.1Django TrunkDjango Trunk with patch
11.05046296121.208662986761.20275402069
21.070580959321.209710121151.20344805717
31.14760208131.20852494241.20287799835
41.065352916721.210731983181.20276403427
51.054219961171.211869001391.20313000679

Best I can tell, my patch is actually slightly faster, so yay!

comment:18 Changed 4 years ago by jshedd

As requested by Alex_Gaynor, running Gary Wilson's benchmark:

python -m timeit -n 50 -s "from views import my_render" "my_render()"

Django 1.1.1:
50 loops, best of 3: 3.64 msec per loop
50 loops, best of 3: 3.66 msec per loop
50 loops, best of 3: 3.65 msec per loop

Trunk, without patch:
50 loops, best of 3: 4.12 msec per loop
50 loops, best of 3: 4.11 msec per loop
50 loops, best of 3: 4.14 msec per loop

Trunk, with patch:
50 loops, best of 3: 4.13 msec per loop
50 loops, best of 3: 4.11 msec per loop
50 loops, best of 3: 4.13 msec per loop

Script looked like:

from django.template import Context, Template
from django.conf import settings
settings.configure()

def my_render():
    s = """\
<html>
<body>
    <div id="content">
    {% if items %}
    <ul>
        <li>Single item:</li>
        <li>
        {{ item.0.name }}
        </li>
    {% for item in items %}
    
        {% if item.name %}
        <li>{{ item.name }}</li>
        {% endif %}
        {% if not item.name %}
        <li>No name item</li>
        {% endif %}
        {% for item in fooitems %}
        <li>{{ item.name }}</li>
        {% endfor %}
        {% if item.des %}
        <li>{{ item.des }}</li>
        {% endif %}
    {% endfor %}
    {% if items %}
        <li>Spaceless items to follow:</li>
        {% spaceless %}
        {% for item in items %}
        
        <li>{{ item.name }}</li>
        {% endfor %}
        {% endspaceless %}
    {% endif %}
    </ul>
    {% endif %}
    {% if items %}
    <ul>
    {% for item in items %}
        {{ item.name }}
    {% endfor %}
    {% for foo in foos %}
    
        {{ foo }}
    {% endfor %}
    {% for item in items %}
        <li>huh{{ foo }}</li>
    {% endfor %}
    </ul>
    {% endif %}
    {% if items %}<p>We have items.</p>{% endif %}
    {% if items %}
    <p>We have items.</p>
    {% endif %}
    {% if foo %}<p>We have foo.</p>{% endif %}
    {% if foo %}
    <p>We have foo.</p>
    {% endif %}
    </div>
</body>
</html>
"""
    class Item:
        pass
    items = []
    for x in range(4):
        item = Item()
        item.name = 'item' + str(x)
        items.append(item)
    t = Template(s)
    c = Context({'items': items})
    return t.render(c)

comment:19 Changed 4 years ago by jshedd

  • Triage Stage changed from Someday/Maybe to Unreviewed

comment:20 Changed 4 years ago by russellm

  • Triage Stage changed from Unreviewed to Accepted

The idea has already been notionally accepted by Jacob - patch now needs review and triage.

comment:21 Changed 4 years ago by russellm

  • Needs documentation set

On first inspection it looks reasonably good. The performance stats are promising, too.

I can see a couple of minor areas for improvement:

  • Trailing whitespace - a lot of the diff is actually just your editor adding spaces to the end of lines.
  • Edge cases in tests - futzing with regular expressions makes me nervous, so I'd like to see a lot more tests to verify that nothing weird is going on. In particular, the test cases are fairly minimal - most of them start with the newline that is going to be stripped. I'd be interested in seeing more verification that if there is text/newlines *before* the newline that is to be stripped, that text won't be affected adversely.
  • Documentation. This behaviour needs to be documented -- nothing major; a quick note in the description of the template language should do.

The other thing holding up this ticket is the start of feature development for 1.3. We can't apply this until be branch for 1.2 support.

comment:22 Changed 4 years ago by russellm

  • Patch needs improvement set

comment:23 Changed 3 years ago by sneakywombat

Could you just do something like this?

t = Template(s)
c = Context({'items': items})
result = t.render(c)
return os.linesep.join([s for s in result.splitlines() if s.strip()])

comment:24 Changed 3 years ago by steveire

I wrote some more tests for this and it turned out that the presence of a tag would strip out all whitespace occurring before it, not just one line. I don't know if that was intentional, but I thought it surprising.

Another thing I noticed is that trailing whitespace is not stripped. I don't know how relevant that is, but I mimiced that behavior in this patch.

BEFORE PATCH:
kde-devel@bishop:~/Random/djtbenchmarks$ python -m timeit -n 50 -s "from bm import my_render" "my_render()"
50 loops, best of 3: 3.29 msec per loop

AFTER PATCH:
kde-devel@bishop:~/Random/djtbenchmarks$ python -m timeit -n 50 -s "from bm import my_render" "my_render()"
50 loops, best of 3: 3.51 msec per loop

comment:25 Changed 3 years ago by mrmachine

  • Cc mrmachine added

comment:26 Changed 3 years ago by robhudson

The patch attached brings in jshedd's work and steveire's tests. I updated to trunk, removed trailing white space, and tweaked regex slightly.

Pre regex tweaking had failures on templatetag-whitespace numbers 14-17, 21-22, 25-30, 32, 34.
Tweaked regex fails now on 26-30, 32, 34.

I plan on breaking out the regular expression with comments to better understand it, and try to fix the remaining failures.

Changed 3 years ago by robhudson

Patched against trunk (and template shuffling), removed trailing white space, and adjusted to add comments and make pass with steveire's tests.

comment:27 Changed 3 years ago by robhudson

My local testing shows:

python -m timeit -n 50 -s "from views import my_render" "my_render()"

Django 1.2.3
50 loops, best of 3: 3.75 msec per loop
50 loops, best of 3: 3.74 msec per loop
50 loops, best of 3: 3.76 msec per loop

Django trunk without patch
50 loops, best of 3: 3.72 msec per loop
50 loops, best of 3: 3.73 msec per loop
50 loops, best of 3: 3.73 msec per loop

Django trunk with patch
50 loops, best of 3: 3.73 msec per loop
50 loops, best of 3: 3.73 msec per loop
50 loops, best of 3: 3.72 msec per loop

comment:28 Changed 3 years ago by mtredinnick

I seriously doubt anybody has used vertical tabs (\v) or form feeds (\f) in Django templates ever. There's no need to put them in the reg-exp.

Also, realise that this is backwards incompatible and not in a trivial way. Anybody using templates to generate output in a format where whitespace is always significant (plain text email is one very big case) would have their output broken by this (unclear if Jacob and Russell both thought this was an acceptable sacrifice -- I definitely don't -- or if it was overlooked). It has to be optional and off by default in the rendering process. (The other case where I can immediately think of this breaking something is Django's debug page's "paste traceback" button -- I remember Erik Karulf spending ages to make sure the whitespace in what we POST'ed to dpaste was sensible.

comment:29 Changed 3 years ago by carljm

I agree with Malcolm that outright changing this is a backwards-compatibility issue. Unfortunately.

On the other hand, I think the new behavior is clearly superior in pretty much every case, and a major win, so I think it's worth finding a way to do it via an optional flag (an argument to Template.init? Would then need to be added to several shortcuts as well...), and then consider (later?) possible deprecation paths to migrate towards making it the default.

comment:30 Changed 3 years ago by mrmachine

I'd like to see this enabled with a template tag, as it is template authors who will be needing to use it most. I'd also like to see a deprecation path so that this can become the default behaviour.

Similar to when autoescaping was introduced, we could add a template tag {% stripwhitespace on|off %}{% endstripwhitespace %} that people can put in their base templates to preserve the existing behaviour when the default changes or enable the new behaviour before it becomes the default. We could raise a deprecation warning whenever a template is rendered without this tag.

comment:31 Changed 3 years ago by steveire

I'm not very familiar with the behavior of the blocktrans tag, but I guess there would need to be some handling of the whitespace at the end of a msgid being present in the po file, but not in the compiled template. I don't know if gettext is that strict. I guess the message extractor script should handle template syntax on its own line in the same way that the latest patch here does.

If the django message extractor script already uses the same classes, then there may be no problem, but it's something to look into.

This is related to #5849.

Changed 3 years ago by steveire

comment:32 Changed 3 years ago by steveire

  • Needs documentation unset
  • Patch needs improvement unset

Changed 3 years ago by fahhem

Modifies DebugLexer too, not just Lexer

Changed 3 years ago by fahhem

Same as previous, with fixed whitespace issue

comment:33 Changed 3 years ago by lrekucki

  • Type changed from defect to Bug

comment:34 Changed 3 years ago by lrekucki

  • Severity changed from normal to Normal

comment:35 Changed 3 years ago by patchhammer

  • Easy pickings unset
  • Patch needs improvement set

0001-Commiting-whitespace-patch-from-django-ticket-2594.2.patch fails to apply cleanly on to trunk

comment:36 Changed 2 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:37 Changed 2 years ago by tobia

I just started a thread about an alternative solution to this problem.

Newline stripping in templates: the dnl way

http://groups.google.com/group/django-developers/browse_thread/thread/8d6b6a3aae82ee03

comment:38 Changed 2 years ago by kmike

  • Cc kmike84@… added

comment:39 Changed 2 years ago by andrep

  • Cc andrep added

comment:40 Changed 2 years ago by Twidi

  • Cc Twidi added

comment:41 Changed 2 years ago by anonymous

Please someone from django core sacrifice a day, and test and ad this to 1.4.x. Thanks a lot!

comment:42 Changed 2 years ago by Semmel

  • Cc Florian.Sening@… added

comment:43 Changed 2 years ago by waldir

  • Cc waldir added

comment:44 Changed 22 months ago by ldiqual@…

  • Cc ldiqual@… added


comment:45 Changed 22 months ago by leonov

  • Cc python@… added

Changed 22 months ago by leonov

Update patch against 1.5-dev

comment:46 Changed 22 months ago by leonov

I've spent the morning updating the patch against the latest development version.

It was fairly straightforward, except for the Lexer.create_token() method, which took some time. It was already fairly hairy in 1.4, and the addition of the verbatim tag in 1.5-dev had made it even more so. I've added comments to make it easier for the next person to unravel the logic.

All tests pass, pre- and post-patch, and the results of Gary Wilson's benchmark remain good.

python -m timeit -n500 -s 'from bm import my_render' 'my_render()'

Without patch:
500 loops, best of 3: 1.96 msec per loop
500 loops, best of 3: 1.95 msec per loop
500 loops, best of 3: 1.96 msec per loop

With patch, TEMPLATE_STRIP_LEADING_WHITESPACE = True:
500 loops, best of 3: 1.96 msec per loop
500 loops, best of 3: 1.96 msec per loop
500 loops, best of 3: 1.97 msec per loop

With patch, TEMPLATE_STRIP_LEADING_WHITESPACE = False:
500 loops, best of 3: 1.99 msec per loop
500 loops, best of 3: 2 msec per loop
500 loops, best of 3: 2.01 msec per loop

comment:47 Changed 17 months ago by BinaryKhaos

  • Cc ua_django_bugzilla@… added

comment:48 Changed 15 months ago by s.shanabrook@…

  • Cc s.shanabrook@… added

What needs to be done to get this merged still?

comment:49 Changed 15 months ago by mrmachine

You could try reading through the ticket and django-developers group history so that you're familiar with all the options and opinions that have been presented. Then outline a proposal on the django-developers group to move the ticket forward by choosing a preferred option (probably the one which had the most support), summarise why it has been chosen over any others, update any patches so that they apply cleanly and ask for a code review and any final objections that would delay this being committed.

comment:50 Changed 14 months ago by anonymous

Why not implement the {%- tag -%} support in Jinja2, and let people choose themselves?

comment:51 Changed 9 months ago by chipx86

  • Cc chipx86@… added

comment:52 Changed 8 months ago by tomasbabej@…

  • Cc tomasbabej@… added

comment:53 Changed 8 months ago by FunkyBob

There's no way I'd want to have _only_ a global flag to enable this behaviour. Certainly, there are cases where I'd rather avoid leading spaces and blank lines, but equally there are cases where it's important.

Not all the world is HTML. Django templates are frequently used for content types where the leading whitespace may be important, in the same projects where it's used for HTML, where it's less important.

Typically to avoid these problems I've started using constructs like:

<h1>The title</h1>
<ul>{% for item in somelist %}
    <li>{{ item }}</li>{% endfor %}
</ul>

Perhaps not as obvious with the template tags, but clearer with the HTML.

comment:54 Changed 8 months ago by Semmel

Just wanted to note that this is exactly the problem I'm facing. Not everything is HTML and sometimes whitespace matters. The templates for these cases are very hard to read. :/

comment:55 Changed 6 months ago by trbs

  • Cc trbs@… added

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from jshedd to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.