Code

Opened 6 years ago

Closed 14 months ago

Last modified 11 months ago

#7806 closed Cleanup/optimization (fixed)

django.template refactoring

Reported by: emulbreh Owned by: aaugustin
Component: Template system Version: master
Severity: Normal Keywords: tplrf
Cc: carljm, research@…, alexkoshelev Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Why should django.template be refactored?

Filter and Variable parsing is inconsistent. Many ad hoc parsers in defaulttags are fragile.
Whitespace handling is ungraceful.

The patch provided splits __init__.py into separate modules and introduces a TokenStream
class that allows parsing of literals, vars (called lookups) and filter expressions. No docs yet,
have a look at the source (it all happens in ~150 loc).

The patch only touches django.template and django.templatetags.
Tests are separate so you can run them with the old code.

Modules in django.template

  • context (no dependencies)
    Untouched.


  • expressions (no dependencies)
    Contains Lookup, Literal, and FilterExpression. All are subclasses of a marker class Expression. These replace Variable and the old FilterExpression class. A Variable equivalent is provided in compat.Variable.


  • nodes (depends on expressions)
    Contains Node, NodeList, TextNode, and ExpressionNode. The latter is the old VariableNode renamed. A VariableNode alias is provided in compat.


  • compiler (depends on context, expressions, nodes)
    Contains Parser, Lexer, Token, TemplateSyntaxError, Origin, StringOrgin, and compile_string(). Those are mostly untouched. Additionally there are TokenStream, and TokenSyntaxError. Those provide a safe way to parse Tokens containing expressions.


  • library (depends on compiler, nodes, context)
    Contains InvalidTemplateLibrary, Library, get_library(), and add_to_builtins(). Mostly untouched. Should be called libraries for consistency, but that would clash with the legacy dict.


  • loader
    Untouched. Moved TemplateDoesNotExist here.


  • utils
    ConditionalNode, base class for IfNode, IfEqualNode. EmptyNode. Helper functions: parse_conditional_nodelists(parser, name), parse_as(bits) parse_args_and_kwargs(bits), resolve_args_and_kwargs(args, kwargs)



  • compat
    Provides backwards compatibility for Variable, VariableNode, TokenParser, and resolve_variable()


  • defaulttags, loader_tags
    Mostly refactored to use TokenStream where appropriate.


Tickets

  • #4746 - allow whitespace before and after filter separator
    fixed. tests are there (filter-syntax03, filter-syntax04) but expect TemplateSyntaxError.


  • #5270 - empty string literals
    fixed. tests included.


  • #5756 - accept filters (almost) everywhere
    fixed. a few tests included.



  • #5971 - token parser bug
    TokenParser will be deprecated.


  • #6271 - filter arguments with spaces
    fixed. test included. (seems to be fixed in trunk already)


  • #6296 - expressions for ifequal
    fixed. tests included (dup of #5756 ?)


  • #6510 - get_nodes_by_type() problem
    probably fixed.


  • #6535 - negative numeric literals
    fixed. tests included.


  • #7295 - quotes, escaping and translation of string literals handled inconsistently in templates
    fixed.


Performance

Comparing runtests.py templates measures, the refactored version is a bit slower.

Todo

{% if not %} is currently valid. The refactored implementation is greedy and reports a syntax error.
Could be fixed, not sure it's worth it. Broken tests are: if-tag-not02, if-tag-not03. DDN.

Numeric literals ending with "." are now considered a syntax error.
Broken test: ifequal-numeric07. DDN.

{% url %} currently accepts an arbitrary unquoted unicode string for the view name.
The refactored implementation only accepts bare python identifiers or quoted string literals.
Broken test: url05. DDN.

TemplateSyntaxError messages need some work. Eventually shortcut methods on TokenStream.

Find better names for TokenSyntaxError and TokenStream.

Provide a decorator that handles the boilerplate bits=parser.token_stream(token)
and bits.assert_consumed() calls.

Investigate performance impact.

Docs and more tests.

Attachments (5)

tplrf.diff (109.6 KB) - added by emulbreh 6 years ago.
tplrf-tests.diff (6.3 KB) - added by emulbreh 6 years ago.
tplrf.2.diff (116.9 KB) - added by emulbreh 6 years ago.
tplrf-r8970.diff (129.2 KB) - added by emulbreh 6 years ago.
with tests
7806.tplrf-r11593.diff (129.1 KB) - added by emulbreh 4 years ago.

Download all attachments as: .zip

Change History (29)

Changed 6 years ago by emulbreh

Changed 6 years ago by emulbreh

comment:1 Changed 6 years ago by emulbreh

  • Has patch set
  • Needs documentation set
  • Needs tests set
  • Patch needs improvement set

I'm going to add a tplrf-fixed keyword to #4746, #5270, #5756, #5862, #6271, #6296, #6510, #6535, and #7295 - unless someone tells me that's a bad idea.

Changed 6 years ago by emulbreh

comment:2 Changed 6 years ago by emulbreh

  • made decorators 2.3 compatible
  • added a (clusily named) decorator for parser functions using TokenStream: token_stream_parsed.
  • moved some common syntax error handling to TokenStream.
  • converted some remaining compile_filter() calls

Notes: {% cycle %} is still missing. The docstring/code ratio in defaultags starts to get ugly. A DebugTokenStream should capture the exact token positions for nicer error messages.

comment:3 Changed 6 years ago by Simon Greenhill

  • milestone set to post-1.0
  • Triage Stage changed from Unreviewed to Accepted

Accepted as post-1.0 following discussion on #7799.

comment:4 Changed 6 years ago by emulbreh

This would also fix #7377.

Changed 6 years ago by emulbreh

with tests

comment:5 Changed 6 years ago by carljm

  • Cc carl@… added

comment:6 Changed 6 years ago by anonymous

  • Cc research@… added

comment:7 Changed 6 years ago by SmileyChris

This is all incredibly impressive, emulbreh.

I've got a tag token parser which has a different tact - you're more than welcome to a look. Send me an email if you're interested - [me]@gmail.com

comment:8 Changed 5 years ago by emulbreh

This would also fix #9315.

comment:9 Changed 5 years ago by jacob

This looks really good! I've got a few questions:

  • Have you done any benchmarking? I'd be interested in any speed implications.
  • The @uses_token_stream decorator seems a bit clumsy (as you note), but I see why you did it. Any thoughts on a more elegant way of getting the new functionality in in a backwards-compatible way? What about making Token.__iter__ return a TokenStream? That'd allow a pretty natural for x in token: approach in custom tags.
  • There's no documentation of the new TokenStream class, either as docs or docstrings. This needs to happen, if only so I can figure out what the hell pop_lexm means.

I'm sure I'll have more questions later, but for now that'll do. Thanks!

comment:10 Changed 5 years ago by emulbreh

I haven't done any real benchmarking. Comparing time spent in the "templates" test, it's about 1.18 times slower.

There's a tiny bit of documentation in this thread.
Hopefully enough to dicuss the API. I'll add docstrings in the next patch.

Token.__iter__ would seldomly be useful as you'd typically use the higher level parser functions in custom tags. Perhaps TokenStream should be named TokenParser.
Simon Willison proposed @register.tag(token_stream=True).

comment:11 Changed 5 years ago by carljm

  • Cc carljm added; carl@… removed

comment:12 Changed 5 years ago by anonymous

  • milestone post-1.0 deleted

Milestone post-1.0 deleted

comment:13 Changed 5 years ago by alexkoshelev

  • Cc alexkoshelev added

comment:14 Changed 5 years ago by emulbreh

Changed 4 years ago by emulbreh

comment:15 Changed 4 years ago by emulbreh

  • Resolution set to wontfix
  • Status changed from new to closed

For the sake of completeness: The current (outdated) version of my git branch.
I'm also closing this ticket as wontfix as I will do no further work on it.

comment:16 Changed 4 years ago by alexkoshelev

  • Resolution wontfix deleted
  • Status changed from closed to reopened

I think it must be completed anyway.

comment:17 Changed 3 years ago by lukeplant

  • Severity set to Normal
  • Type set to Cleanup/optimization

comment:18 Changed 2 years ago by aaugustin

  • Easy pickings unset
  • Owner changed from nobody to aaugustin
  • Status changed from reopened to new
  • UI/UX unset

comment:19 follow-up: Changed 14 months ago by aaugustin

  • Resolution set to fixed
  • Status changed from new to closed

All tickets mentioned in the summary have been fixed.

Last week-end, during the sprint, Baptiste Mispelon wrote patches to use Token.split_contents() consistently — that was my goal when assigning this ticket to myself.

I think most of the goals of this ticket have been achieved — well, it's difficult to tell for sure because the patch is large and has been out of date for a few years.

If I've missed something, please re-open a new ticket describing remaining problems.

comment:20 in reply to: ↑ 19 Changed 14 months ago by emulbreh

My original motivation for this refactoring was the poor API for custom tag parsers. As a side effect, the new parser API fixed a bunch of open bugs with builtin tags.
While the primary goal for this ticket hasn't been achieved, I agree that it should be closed. A better API for custom tags should have its own ticket (if it doesn't already).

comment:21 follow-up: Changed 14 months ago by aaugustin

Isn't @register.simple_tag the "better API" for custom tags?

comment:22 in reply to: ↑ 21 Changed 14 months ago by anonymous

Almost. But I believe that the API should be powerful enough to parse all builtin tags.

comment:23 Changed 14 months ago by aaugustin

Or maybe the syntax of the builtin tags should be normalized so they can all be parsed with the generic API ;)

comment:24 Changed 11 months ago by jonathanslenders

@aaugustin and @emulbreh: Could some of you have a look at ticket? #20434

I finished the refactoring of a better API of custom tags. See patch2 in that ticket about how I used the "normalized syntax" to rewrite all of the builtin tags.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.