Opened 17 years ago

Closed 12 years ago

Last modified 12 years ago

#7806 closed Cleanup/optimization (fixed)

django.template refactoring

Reported by: Johannes Dollinger Owned by: Aymeric Augustin
Component: Template system Version: dev
Severity: Normal Keywords: tplrf
Cc: Carl Meyer, research@…, Alexander Koshelev Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Why should django.template be refactored?

Filter and Variable parsing is inconsistent. Many ad hoc parsers in defaulttags are fragile.
Whitespace handling is ungraceful.

The patch provided splits __init__.py into separate modules and introduces a TokenStream
class that allows parsing of literals, vars (called lookups) and filter expressions. No docs yet,
have a look at the source (it all happens in ~150 loc).

The patch only touches django.template and django.templatetags.
Tests are separate so you can run them with the old code.

Modules in django.template

  • context (no dependencies)
    Untouched.


  • expressions (no dependencies)
    Contains Lookup, Literal, and FilterExpression. All are subclasses of a marker class Expression. These replace Variable and the old FilterExpression class. A Variable equivalent is provided in compat.Variable.


  • nodes (depends on expressions)
    Contains Node, NodeList, TextNode, and ExpressionNode. The latter is the old VariableNode renamed. A VariableNode alias is provided in compat.


  • compiler (depends on context, expressions, nodes)
    Contains Parser, Lexer, Token, TemplateSyntaxError, Origin, StringOrgin, and compile_string(). Those are mostly untouched. Additionally there are TokenStream, and TokenSyntaxError. Those provide a safe way to parse Tokens containing expressions.


  • library (depends on compiler, nodes, context)
    Contains InvalidTemplateLibrary, Library, get_library(), and add_to_builtins(). Mostly untouched. Should be called libraries for consistency, but that would clash with the legacy dict.


  • loader
    Untouched. Moved TemplateDoesNotExist here.


  • utils
    ConditionalNode, base class for IfNode, IfEqualNode. EmptyNode. Helper functions: parse_conditional_nodelists(parser, name), parse_as(bits) parse_args_and_kwargs(bits), resolve_args_and_kwargs(args, kwargs)



  • compat
    Provides backwards compatibility for Variable, VariableNode, TokenParser, and resolve_variable()


  • defaulttags, loader_tags
    Mostly refactored to use TokenStream where appropriate.


Tickets

  • #4746 - allow whitespace before and after filter separator
    fixed. tests are there (filter-syntax03, filter-syntax04) but expect TemplateSyntaxError.


  • #5270 - empty string literals
    fixed. tests included.


  • #5756 - accept filters (almost) everywhere
    fixed. a few tests included.



  • #5971 - token parser bug
    TokenParser will be deprecated.


  • #6271 - filter arguments with spaces
    fixed. test included. (seems to be fixed in trunk already)


  • #6296 - expressions for ifequal
    fixed. tests included (dup of #5756 ?)


  • #6510 - get_nodes_by_type() problem
    probably fixed.


  • #6535 - negative numeric literals
    fixed. tests included.


  • #7295 - quotes, escaping and translation of string literals handled inconsistently in templates
    fixed.


Performance

Comparing runtests.py templates measures, the refactored version is a bit slower.

Todo

{% if not %} is currently valid. The refactored implementation is greedy and reports a syntax error.
Could be fixed, not sure it's worth it. Broken tests are: if-tag-not02, if-tag-not03. DDN.

Numeric literals ending with "." are now considered a syntax error.
Broken test: ifequal-numeric07. DDN.

{% url %} currently accepts an arbitrary unquoted unicode string for the view name.
The refactored implementation only accepts bare python identifiers or quoted string literals.
Broken test: url05. DDN.

TemplateSyntaxError messages need some work. Eventually shortcut methods on TokenStream.

Find better names for TokenSyntaxError and TokenStream.

Provide a decorator that handles the boilerplate bits=parser.token_stream(token)
and bits.assert_consumed() calls.

Investigate performance impact.

Docs and more tests.

Attachments (5)

tplrf.diff (109.6 KB ) - added by Johannes Dollinger 17 years ago.
tplrf-tests.diff (6.3 KB ) - added by Johannes Dollinger 17 years ago.
tplrf.2.diff (116.9 KB ) - added by Johannes Dollinger 17 years ago.
tplrf-r8970.diff (129.2 KB ) - added by Johannes Dollinger 16 years ago.
with tests
7806.tplrf-r11593.diff (129.1 KB ) - added by Johannes Dollinger 15 years ago.

Download all attachments as: .zip

Change History (29)

by Johannes Dollinger, 17 years ago

Attachment: tplrf.diff added

by Johannes Dollinger, 17 years ago

Attachment: tplrf-tests.diff added

comment:1 by Johannes Dollinger, 17 years ago

Has patch: set
Needs documentation: set
Needs tests: set
Patch needs improvement: set

I'm going to add a tplrf-fixed keyword to #4746, #5270, #5756, #5862, #6271, #6296, #6510, #6535, and #7295 - unless someone tells me that's a bad idea.

by Johannes Dollinger, 17 years ago

Attachment: tplrf.2.diff added

comment:2 by Johannes Dollinger, 17 years ago

  • made decorators 2.3 compatible
  • added a (clusily named) decorator for parser functions using TokenStream: token_stream_parsed.
  • moved some common syntax error handling to TokenStream.
  • converted some remaining compile_filter() calls

Notes: {% cycle %} is still missing. The docstring/code ratio in defaultags starts to get ugly. A DebugTokenStream should capture the exact token positions for nicer error messages.

comment:3 by Simon Greenhill, 17 years ago

milestone: post-1.0
Triage Stage: UnreviewedAccepted

Accepted as post-1.0 following discussion on #7799.

comment:4 by Johannes Dollinger, 17 years ago

This would also fix #7377.

by Johannes Dollinger, 16 years ago

Attachment: tplrf-r8970.diff added

with tests

comment:5 by Carl Meyer, 16 years ago

Cc: carl@… added

comment:6 by anonymous, 16 years ago

Cc: research@… added

comment:7 by Chris Beaven, 16 years ago

This is all incredibly impressive, emulbreh.

I've got a tag token parser which has a different tact - you're more than welcome to a look. Send me an email if you're interested - [me]@gmail.com

comment:8 by Johannes Dollinger, 16 years ago

This would also fix #9315.

comment:9 by Jacob, 16 years ago

This looks really good! I've got a few questions:

  • Have you done any benchmarking? I'd be interested in any speed implications.
  • The @uses_token_stream decorator seems a bit clumsy (as you note), but I see why you did it. Any thoughts on a more elegant way of getting the new functionality in in a backwards-compatible way? What about making Token.__iter__ return a TokenStream? That'd allow a pretty natural for x in token: approach in custom tags.
  • There's no documentation of the new TokenStream class, either as docs or docstrings. This needs to happen, if only so I can figure out what the hell pop_lexm means.

I'm sure I'll have more questions later, but for now that'll do. Thanks!

comment:10 by Johannes Dollinger, 16 years ago

I haven't done any real benchmarking. Comparing time spent in the "templates" test, it's about 1.18 times slower.

There's a tiny bit of documentation in this thread.
Hopefully enough to dicuss the API. I'll add docstrings in the next patch.

Token.__iter__ would seldomly be useful as you'd typically use the higher level parser functions in custom tags. Perhaps TokenStream should be named TokenParser.
Simon Willison proposed @register.tag(token_stream=True).

comment:11 by Carl Meyer, 16 years ago

Cc: Carl Meyer added; carl@… removed

comment:12 by (none), 16 years ago

milestone: post-1.0

Milestone post-1.0 deleted

comment:13 by Alexander Koshelev, 16 years ago

Cc: Alexander Koshelev added

comment:14 by Johannes Dollinger, 15 years ago

by Johannes Dollinger, 15 years ago

Attachment: 7806.tplrf-r11593.diff added

comment:15 by Johannes Dollinger, 15 years ago

Resolution: wontfix
Status: newclosed

For the sake of completeness: The current (outdated) version of my git branch.
I'm also closing this ticket as wontfix as I will do no further work on it.

comment:16 by Alexander Koshelev, 15 years ago

Resolution: wontfix
Status: closedreopened

I think it must be completed anyway.

comment:17 by Luke Plant, 14 years ago

Severity: Normal
Type: Cleanup/optimization

comment:18 by Aymeric Augustin, 13 years ago

Easy pickings: unset
Owner: changed from nobody to Aymeric Augustin
Status: reopenednew
UI/UX: unset

comment:19 by Aymeric Augustin, 12 years ago

Resolution: fixed
Status: newclosed

All tickets mentioned in the summary have been fixed.

Last week-end, during the sprint, Baptiste Mispelon wrote patches to use Token.split_contents() consistently — that was my goal when assigning this ticket to myself.

I think most of the goals of this ticket have been achieved — well, it's difficult to tell for sure because the patch is large and has been out of date for a few years.

If I've missed something, please re-open a new ticket describing remaining problems.

in reply to:  19 comment:20 by Johannes Dollinger, 12 years ago

My original motivation for this refactoring was the poor API for custom tag parsers. As a side effect, the new parser API fixed a bunch of open bugs with builtin tags.
While the primary goal for this ticket hasn't been achieved, I agree that it should be closed. A better API for custom tags should have its own ticket (if it doesn't already).

comment:21 by Aymeric Augustin, 12 years ago

Isn't @register.simple_tag the "better API" for custom tags?

in reply to:  21 comment:22 by anonymous, 12 years ago

Almost. But I believe that the API should be powerful enough to parse all builtin tags.

comment:23 by Aymeric Augustin, 12 years ago

Or maybe the syntax of the builtin tags should be normalized so they can all be parsed with the generic API ;)

comment:24 by jonathanslenders, 12 years ago

@aaugustin and @emulbreh: Could some of you have a look at ticket? #20434

I finished the refactoring of a better API of custom tags. See patch2 in that ticket about how I used the "normalized syntax" to rewrite all of the builtin tags.

Note: See TracTickets for help on using tickets.
Back to Top