Opened 8 years ago

Closed 4 years ago

Last modified 3 years ago

#7806 closed Cleanup/optimization (fixed)

django.template refactoring

Reported by: emulbreh Owned by: aaugustin
Component: Template system Version: master
Severity: Normal Keywords: tplrf
Cc: carljm, research@…, alexkoshelev Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no


Why should django.template be refactored?

Filter and Variable parsing is inconsistent. Many ad hoc parsers in defaulttags are fragile.
Whitespace handling is ungraceful.

The patch provided splits into separate modules and introduces a TokenStream
class that allows parsing of literals, vars (called lookups) and filter expressions. No docs yet,
have a look at the source (it all happens in ~150 loc).

The patch only touches django.template and django.templatetags.
Tests are separate so you can run them with the old code.

Modules in django.template

  • context (no dependencies)

  • expressions (no dependencies)
    Contains Lookup, Literal, and FilterExpression. All are subclasses of a marker class Expression. These replace Variable and the old FilterExpression class. A Variable equivalent is provided in compat.Variable.

  • nodes (depends on expressions)
    Contains Node, NodeList, TextNode, and ExpressionNode. The latter is the old VariableNode renamed. A VariableNode alias is provided in compat.

  • compiler (depends on context, expressions, nodes)
    Contains Parser, Lexer, Token, TemplateSyntaxError, Origin, StringOrgin, and compile_string(). Those are mostly untouched. Additionally there are TokenStream, and TokenSyntaxError. Those provide a safe way to parse Tokens containing expressions.

  • library (depends on compiler, nodes, context)
    Contains InvalidTemplateLibrary, Library, get_library(), and add_to_builtins(). Mostly untouched. Should be called libraries for consistency, but that would clash with the legacy dict.

  • loader
    Untouched. Moved TemplateDoesNotExist here.

  • utils
    ConditionalNode, base class for IfNode, IfEqualNode. EmptyNode. Helper functions: parse_conditional_nodelists(parser, name), parse_as(bits) parse_args_and_kwargs(bits), resolve_args_and_kwargs(args, kwargs)

  • compat
    Provides backwards compatibility for Variable, VariableNode, TokenParser, and resolve_variable()

  • defaulttags, loader_tags
    Mostly refactored to use TokenStream where appropriate.


  • #4746 - allow whitespace before and after filter separator
    fixed. tests are there (filter-syntax03, filter-syntax04) but expect TemplateSyntaxError.

  • #5270 - empty string literals
    fixed. tests included.

  • #5756 - accept filters (almost) everywhere
    fixed. a few tests included.

  • #5971 - token parser bug
    TokenParser will be deprecated.

  • #6271 - filter arguments with spaces
    fixed. test included. (seems to be fixed in trunk already)

  • #6296 - expressions for ifequal
    fixed. tests included (dup of #5756 ?)

  • #6510 - get_nodes_by_type() problem
    probably fixed.

  • #6535 - negative numeric literals
    fixed. tests included.

  • #7295 - quotes, escaping and translation of string literals handled inconsistently in templates


Comparing templates measures, the refactored version is a bit slower.


{% if not %} is currently valid. The refactored implementation is greedy and reports a syntax error.
Could be fixed, not sure it's worth it. Broken tests are: if-tag-not02, if-tag-not03. DDN.

Numeric literals ending with "." are now considered a syntax error.
Broken test: ifequal-numeric07. DDN.

{% url %} currently accepts an arbitrary unquoted unicode string for the view name.
The refactored implementation only accepts bare python identifiers or quoted string literals.
Broken test: url05. DDN.

TemplateSyntaxError messages need some work. Eventually shortcut methods on TokenStream.

Find better names for TokenSyntaxError and TokenStream.

Provide a decorator that handles the boilerplate bits=parser.token_stream(token)
and bits.assert_consumed() calls.

Investigate performance impact.

Docs and more tests.

Attachments (5)

tplrf.diff (109.6 KB) - added by emulbreh 8 years ago.
tplrf-tests.diff (6.3 KB) - added by emulbreh 8 years ago.
tplrf.2.diff (116.9 KB) - added by emulbreh 8 years ago.
tplrf-r8970.diff (129.2 KB) - added by emulbreh 8 years ago.
with tests
7806.tplrf-r11593.diff (129.1 KB) - added by emulbreh 7 years ago.

Download all attachments as: .zip

Change History (29)

Changed 8 years ago by emulbreh

Changed 8 years ago by emulbreh

comment:1 Changed 8 years ago by emulbreh

  • Has patch set
  • Needs documentation set
  • Needs tests set
  • Patch needs improvement set

I'm going to add a tplrf-fixed keyword to #4746, #5270, #5756, #5862, #6271, #6296, #6510, #6535, and #7295 - unless someone tells me that's a bad idea.

Changed 8 years ago by emulbreh

comment:2 Changed 8 years ago by emulbreh

  • made decorators 2.3 compatible
  • added a (clusily named) decorator for parser functions using TokenStream: token_stream_parsed.
  • moved some common syntax error handling to TokenStream.
  • converted some remaining compile_filter() calls

Notes: {% cycle %} is still missing. The docstring/code ratio in defaultags starts to get ugly. A DebugTokenStream should capture the exact token positions for nicer error messages.

comment:3 Changed 8 years ago by Simon Greenhill

  • milestone set to post-1.0
  • Triage Stage changed from Unreviewed to Accepted

Accepted as post-1.0 following discussion on #7799.

comment:4 Changed 8 years ago by emulbreh

This would also fix #7377.

Changed 8 years ago by emulbreh

with tests

comment:5 Changed 8 years ago by carljm

  • Cc carl@… added

comment:6 Changed 8 years ago by anonymous

  • Cc research@… added

comment:7 Changed 8 years ago by SmileyChris

This is all incredibly impressive, emulbreh.

I've got a tag token parser which has a different tact - you're more than welcome to a look. Send me an email if you're interested - [me]

comment:8 Changed 8 years ago by emulbreh

This would also fix #9315.

comment:9 Changed 8 years ago by jacob

This looks really good! I've got a few questions:

  • Have you done any benchmarking? I'd be interested in any speed implications.
  • The @uses_token_stream decorator seems a bit clumsy (as you note), but I see why you did it. Any thoughts on a more elegant way of getting the new functionality in in a backwards-compatible way? What about making Token.__iter__ return a TokenStream? That'd allow a pretty natural for x in token: approach in custom tags.
  • There's no documentation of the new TokenStream class, either as docs or docstrings. This needs to happen, if only so I can figure out what the hell pop_lexm means.

I'm sure I'll have more questions later, but for now that'll do. Thanks!

comment:10 Changed 8 years ago by emulbreh

I haven't done any real benchmarking. Comparing time spent in the "templates" test, it's about 1.18 times slower.

There's a tiny bit of documentation in this thread.
Hopefully enough to dicuss the API. I'll add docstrings in the next patch.

Token.__iter__ would seldomly be useful as you'd typically use the higher level parser functions in custom tags. Perhaps TokenStream should be named TokenParser.
Simon Willison proposed @register.tag(token_stream=True).

comment:11 Changed 8 years ago by carljm

  • Cc carljm added; carl@… removed

comment:12 Changed 8 years ago by anonymous

  • milestone post-1.0 deleted

Milestone post-1.0 deleted

comment:13 Changed 8 years ago by alexkoshelev

  • Cc alexkoshelev added

comment:14 Changed 7 years ago by emulbreh

Changed 7 years ago by emulbreh

comment:15 Changed 7 years ago by emulbreh

  • Resolution set to wontfix
  • Status changed from new to closed

For the sake of completeness: The current (outdated) version of my git branch.
I'm also closing this ticket as wontfix as I will do no further work on it.

comment:16 Changed 7 years ago by alexkoshelev

  • Resolution wontfix deleted
  • Status changed from closed to reopened

I think it must be completed anyway.

comment:17 Changed 5 years ago by lukeplant

  • Severity set to Normal
  • Type set to Cleanup/optimization

comment:18 Changed 5 years ago by aaugustin

  • Easy pickings unset
  • Owner changed from nobody to aaugustin
  • Status changed from reopened to new
  • UI/UX unset

comment:19 follow-up: Changed 4 years ago by aaugustin

  • Resolution set to fixed
  • Status changed from new to closed

All tickets mentioned in the summary have been fixed.

Last week-end, during the sprint, Baptiste Mispelon wrote patches to use Token.split_contents() consistently — that was my goal when assigning this ticket to myself.

I think most of the goals of this ticket have been achieved — well, it's difficult to tell for sure because the patch is large and has been out of date for a few years.

If I've missed something, please re-open a new ticket describing remaining problems.

comment:20 in reply to: ↑ 19 Changed 4 years ago by emulbreh

My original motivation for this refactoring was the poor API for custom tag parsers. As a side effect, the new parser API fixed a bunch of open bugs with builtin tags.
While the primary goal for this ticket hasn't been achieved, I agree that it should be closed. A better API for custom tags should have its own ticket (if it doesn't already).

comment:21 follow-up: Changed 4 years ago by aaugustin

Isn't @register.simple_tag the "better API" for custom tags?

comment:22 in reply to: ↑ 21 Changed 4 years ago by anonymous

Almost. But I believe that the API should be powerful enough to parse all builtin tags.

comment:23 Changed 4 years ago by aaugustin

Or maybe the syntax of the builtin tags should be normalized so they can all be parsed with the generic API ;)

comment:24 Changed 3 years ago by jonathanslenders

@aaugustin and @emulbreh: Could some of you have a look at ticket? #20434

I finished the refactoring of a better API of custom tags. See patch2 in that ticket about how I used the "normalized syntax" to rewrite all of the builtin tags.

Note: See TracTickets for help on using tickets.
Back to Top