#7806 closed Cleanup/optimization (fixed)
django.template refactoring
| Reported by: | Johannes Dollinger | Owned by: | Aymeric Augustin | 
|---|---|---|---|
| Component: | Template system | Version: | dev | 
| Severity: | Normal | Keywords: | tplrf | 
| Cc: | Carl Meyer, research@…, Alexander Koshelev | Triage Stage: | Accepted | 
| Has patch: | yes | Needs documentation: | yes | 
| Needs tests: | yes | Patch needs improvement: | yes | 
| Easy pickings: | no | UI/UX: | no | 
Description
Why should django.template be refactored?
Filter and Variable parsing is inconsistent. Many ad hoc parsers in defaulttags are fragile.
Whitespace handling is ungraceful. 
The patch provided splits __init__.py into separate modules and introduces a TokenStream
class that allows parsing of literals, vars (called lookups) and filter expressions. No docs yet,
have a look at the source (it all happens in ~150 loc).
The patch only touches django.template and django.templatetags.
Tests are separate so you can run them with the old code.
Modules in django.template
- context(no dependencies)
 Untouched.
          
- expressions(no dependencies)
 Contains- Lookup,- Literal, and- FilterExpression. All are subclasses of a marker class- Expression. These replace- Variableand the old- FilterExpressionclass. A- Variableequivalent is provided in- compat.Variable.
      
- nodes(depends on expressions)
 Contains- Node,- NodeList,- TextNode, and- ExpressionNode. The latter is the old- VariableNoderenamed. A- VariableNodealias is provided in- compat.
    
- compiler(depends on context, expressions, nodes)
 Contains- Parser,- Lexer,- Token,- TemplateSyntaxError,- Origin,- StringOrgin, and- compile_string(). Those are mostly untouched. Additionally there are- TokenStream, and- TokenSyntaxError. Those provide a safe way to parse- Tokenscontaining expressions.
    
- library(depends on compiler, nodes, context)
 Contains- InvalidTemplateLibrary,- Library,- get_library(), and- add_to_builtins(). Mostly untouched. Should be called- librariesfor consistency, but that would clash with the legacy dict.
      
- loader
 Untouched. Moved- TemplateDoesNotExisthere.
    
- utils
 - ConditionalNode, base class for- IfNode,- IfEqualNode.- EmptyNode. Helper functions:- parse_conditional_nodelists(parser, name),- parse_as(bits)- parse_args_and_kwargs(bits),- resolve_args_and_kwargs(args, kwargs)
      
    
- compat
 Provides backwards compatibility for- Variable,- VariableNode,- TokenParser, and- resolve_variable()
    
- defaulttags,- loader_tags
 Mostly refactored to use- TokenStreamwhere appropriate.
    
Tickets
- #4746 - allow whitespace before and after filter separator
 fixed. tests are there (filter-syntax03, filter-syntax04) but expect TemplateSyntaxError.
    
- #5270 - empty string literals
 fixed. tests included.
    
- #5756 - accept filters (almost) everywhere
 fixed. a few tests included.
    
    
- #5971 - token parser bug
 TokenParserwill be deprecated.
    
- #6271 - filter arguments with spaces
 fixed. test included. (seems to be fixed in trunk already)
    
    
- #6510 - get_nodes_by_type()problem
 probably fixed.
    
- #6535 - negative numeric literals
 fixed. tests included.
      
- #7295 - quotes, escaping and translation of string literals handled inconsistently in templates
 fixed.
    
Performance
Comparing runtests.py templates measures, the refactored version is a bit slower.
Todo
{% if not %} is currently valid. The refactored implementation is greedy and reports a syntax error.
Could be fixed, not sure it's worth it. Broken tests are: if-tag-not02, if-tag-not03. DDN.
Numeric literals ending with "." are now considered a syntax error.
Broken test: ifequal-numeric07. DDN.
{% url %} currently accepts an arbitrary unquoted unicode string for the view name.
The refactored implementation only accepts bare python identifiers or quoted string literals.
Broken test: url05. DDN.
TemplateSyntaxError messages need some work. Eventually shortcut methods on TokenStream.
Find better names for TokenSyntaxError and TokenStream.
Provide a decorator that handles the boilerplate bits=parser.token_stream(token)
and bits.assert_consumed() calls.
Investigate performance impact.
Docs and more tests.
Attachments (5)
Change History (29)
by , 17 years ago
| Attachment: | tplrf.diff added | 
|---|
by , 17 years ago
| Attachment: | tplrf-tests.diff added | 
|---|
comment:1 by , 17 years ago
| Has patch: | set | 
|---|---|
| Needs documentation: | set | 
| Needs tests: | set | 
| Patch needs improvement: | set | 
by , 17 years ago
| Attachment: | tplrf.2.diff added | 
|---|
comment:2 by , 17 years ago
- made decorators 2.3 compatible
- added a (clusily named) decorator for parser functions using TokenStream:token_stream_parsed.
- moved some common syntax error handling to TokenStream.
- converted some remaining compile_filter()calls
Notes: {% cycle %} is still missing. The docstring/code ratio in defaultags starts to get ugly. A DebugTokenStream should capture the exact token positions for nicer error messages.
 
comment:3 by , 17 years ago
| milestone: | → post-1.0 | 
|---|---|
| Triage Stage: | Unreviewed → Accepted | 
Accepted as post-1.0 following discussion on #7799.
comment:5 by , 17 years ago
| Cc: | added | 
|---|
comment:6 by , 17 years ago
| Cc: | added | 
|---|
comment:7 by , 17 years ago
This is all incredibly impressive, emulbreh.
I've got a tag token parser which has a different tact - you're more than welcome to a look. Send me an email if you're interested - [me]@gmail.com
comment:9 by , 17 years ago
This looks really good! I've got a few questions:
- Have you done any benchmarking? I'd be interested in any speed implications.
- The @uses_token_streamdecorator seems a bit clumsy (as you note), but I see why you did it. Any thoughts on a more elegant way of getting the new functionality in in a backwards-compatible way? What about makingToken.__iter__return aTokenStream? That'd allow a pretty naturalfor x in token:approach in custom tags.
- There's no documentation of the new TokenStreamclass, either as docs or docstrings. This needs to happen, if only so I can figure out what the hellpop_lexmmeans.
I'm sure I'll have more questions later, but for now that'll do. Thanks!
comment:10 by , 17 years ago
I haven't done any real benchmarking. Comparing time spent in the "templates" test, it's about 1.18 times slower.
There's a tiny bit of documentation in this thread.
Hopefully enough to dicuss the API. I'll add docstrings in the next patch.
Token.__iter__ would seldomly be useful as you'd typically use the higher level parser functions in custom tags. Perhaps TokenStream should be named TokenParser.
Simon Willison proposed @register.tag(token_stream=True).
comment:11 by , 17 years ago
| Cc: | added; removed | 
|---|
comment:13 by , 17 years ago
| Cc: | added | 
|---|
by , 16 years ago
| Attachment: | 7806.tplrf-r11593.diff added | 
|---|
comment:15 by , 16 years ago
| Resolution: | → wontfix | 
|---|---|
| Status: | new → closed | 
For the sake of completeness: The current (outdated) version of my git branch.
I'm also closing this ticket as wontfix as I will do no further work on it.
comment:16 by , 16 years ago
| Resolution: | wontfix | 
|---|---|
| Status: | closed → reopened | 
I think it must be completed anyway.
comment:17 by , 15 years ago
| Severity: | → Normal | 
|---|---|
| Type: | → Cleanup/optimization | 
comment:18 by , 14 years ago
| Easy pickings: | unset | 
|---|---|
| Owner: | changed from to | 
| Status: | reopened → new | 
| UI/UX: | unset | 
follow-up: 20 comment:19 by , 13 years ago
| Resolution: | → fixed | 
|---|---|
| Status: | new → closed | 
All tickets mentioned in the summary have been fixed.
Last week-end, during the sprint, Baptiste Mispelon wrote patches to use Token.split_contents() consistently — that was my goal when assigning this ticket to myself.
I think most of the goals of this ticket have been achieved — well, it's difficult to tell for sure because the patch is large and has been out of date for a few years.
If I've missed something, please re-open a new ticket describing remaining problems.
comment:20 by , 13 years ago
My original motivation for this refactoring was the poor API for custom tag parsers. As a side effect, the new parser API fixed a bunch of open bugs with builtin tags. 
While the primary goal for this ticket hasn't been achieved, I agree that it should be closed. A better API for custom tags should have its own ticket (if it doesn't already).
follow-up: 22 comment:21 by , 13 years ago
Isn't @register.simple_tag the "better API" for custom tags?
comment:22 by , 13 years ago
Almost. But I believe that the API should be powerful enough to parse all builtin tags.
comment:23 by , 13 years ago
Or maybe the syntax of the builtin tags should be normalized so they can all be parsed with the generic API ;)
comment:24 by , 12 years ago
@aaugustin and @emulbreh: Could some of you have a look at ticket? #20434
I finished the refactoring of a better API of custom tags. See patch2 in that ticket about how I used the "normalized syntax" to rewrite all of the builtin tags.
I'm going to add a tplrf-fixed keyword to #4746, #5270, #5756, #5862, #6271, #6296, #6510, #6535, and #7295 - unless someone tells me that's a bad idea.