Opened 17 years ago

Closed 17 years ago

#4164 closed (fixed)

Inaccurate tags recognition

Reported by: tonnzor <tonn81@…> Owned by: anonymous
Component: Template system Version: dev
Severity: Keywords:
Cc: tom@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Template lexer's check for tags is inaccurate - it checks only the beginning of the tag, not the ending and / or content.

Sample template:

{{ "some text" #}

This will result:

some text

But it should produce:

{{ "some text" #}

Attachments (1)

template_lexer.diff (1.1 KB ) - added by tonnzor <tonn81@…> 17 years ago.

Download all attachments as: .zip

Change History (8)

by tonnzor <tonn81@…>, 17 years ago

Attachment: template_lexer.diff added

comment:1 by Thomas Steinacher <tom@…>, 17 years ago

Cc: tom@… added

Shouldn't your sample give a syntax error, as you're mixing {{ and #}?

comment:2 by Malcolm Tredinnick, 17 years ago

Has patch: unset
Triage Stage: UnreviewedAccepted

There is an interesting problem going on here, but it's a little bit bigger than this patch. The patch only works if the broken template is the whole "token_string" contents. Having any extra trailing characters would land you right back in the initial problem. Removing the "has patch" designator for that reason.

The bug itself is only triggered if the very first characters of a string of characters that do not contain a real tag match could be the beginning of a match. Again, putting any random character at the start of the string hides the problem. There is also a problem with typos like

{{ foo #} {{ blah }}

That will try to resolve a variable called "foo #} {{ blah", rather than just spitting out the first bit unchanged and then treating the {{ blah }} portion as a well-formed tag.

I'm working on a slight change to the matching expression that should fix all of these problems without any real performance impact.

Thomas: it's a bit of a coin toss whether this should be treated as a syntax error or not. One problem with doing so is that it would make passing the (somewhat improbable) string "}}" as an argument to a tag problematic. At the moment we display any malformed input like this literally in the resulting template, so it's easy to pick up by eye. I'm not completely unhappy with that behaviour.

comment:3 by tonnzor <tonn81@…>, 17 years ago

Owner: changed from Adrian Holovaty to anonymous
Status: newassigned

@Thomas Steinacher:

No. This template will be processed:

  1. Splitted by tag_re regular expression (result: ['{{ "some text" #}',]
  2. Parsed by Lexer.tokenize (result: {{ "some text")
  3. Parsed by Lexer.create_token

Code for step 3:

        if token_string.startswith(VARIABLE_TAG_START):
            token = Token(TOKEN_VAR, token_string[len(VARIABLE_TAG_START):-len(VARIABLE_TAG_END)].strip())
        elif token_string.startswith(BLOCK_TAG_START):
            token = Token(TOKEN_BLOCK, token_string[len(BLOCK_TAG_START):-len(BLOCK_TAG_END)].strip())
        elif token_string.startswith(COMMENT_TAG_START):
            token = Token(TOKEN_COMMENT, '')
        else:
            token = Token(TOKEN_TEXT, token_string)

As you see - only the beginning is checked, and tag's end is cut (any 2 chars).

Provided example is edge case, but this bug can cause very strange errors. I can provide move examples if needed.

comment:4 by tonnzor <tonn81@…>, 17 years ago

@mtredinnick:

Yes, in your example "foo #} {{ blah" is tried to resolve as a variable, but the code related to this resilving should handle it and generate an error.

comment:5 by Malcolm Tredinnick, 17 years ago

Yes, it does handle that case. It would be nice to less greedy, though. However, may not be possible. Fixing the initial problem is the real goal.

comment:6 by tonnzor <tonn81@…>, 17 years ago

Before making a patch I thought about better solution, but its implementation takes a lot of code change (update lexer, tags evaluating, etc.) that can cause new bugs, so I made a simple patch that covers the initial defect.

Anyway, the better solution is better ;-)

comment:7 by Malcolm Tredinnick, 17 years ago

Resolution: fixed
Status: assignedclosed

(In [5104]) Fixed #4164, #4171 -- Reworked some of the template lexer logic to ensure we
don't get caught out by a couple of corner cases.

Note: See TracTickets for help on using tickets.
Back to Top