Opened 6 years ago

Closed 6 years ago

Last modified 5 years ago

#13275 closed (fixed)

url_backwards_re catastrophic backtracking (?) problem

Reported by: kmtracey Owned by: SmileyChris
Component: Template system Version: master
Severity: Keywords:
Cc: gremmie, lrekucki, perrygeo Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:


As noted here: the new url_backwards_re appears to hang in some circumstances. When we've seen this before it has been due to catastrophic backtracking, so I'm guessing that's the cause here, though I can't say with complete certainty that that is the the issue.

Attachments (2)

t13275.tests.diff (4.5 KB) - added by russellm 6 years ago.
Tests for some regressions introduced by r12889
13275.diff (7.1 KB) - added by SmileyChris 6 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 6 years ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted
import re

Changed 6 years ago by russellm

Tests for some regressions introduced by r12889

comment:2 Changed 6 years ago by gremmie

  • Cc gremmie added

comment:3 Changed 6 years ago by lrekucki

  • Cc lrekucki added

The core of this problem is:

re.match(r'''(([^,]+)=?)+$''', ",")

Inner group matches everything up to the comma. Then it fails and the backtracking starts. I played a bit and came up with something like this (positive match instead of negative).

KWARG_OLD_STYLE = r"""(?:(?:\w+=)?(?:'[^']*'|"[^"]*"|[^,\s]+))""" 
re.match("^(%s,)*%s$" % (KWARG_OLD_STYLE, KWARG_OLD_STYLE), "citation_key=reference.citation_key,database=database")

Then you're left only with problem of spliting arguments with possible quoted commas, like 'citiation="Jon, Doe"'.

I don't want to sound like a critic, but this could be solved with a simple 3 state parser that could be reused in other tags also. As it would have to be written in pure python, it could be slower in average case then regular expressions. If anyone thinks this is a good idea, I can write a patch for this.

comment:4 Changed 6 years ago by perrygeo

  • Cc perrygeo added

Confirming that this url_backwards_re hangup is a huge problem - seeing slowdowns of 2 orders of magnitude on many of our templates containing old-style (comma-seperated) url arguments.

comment:5 Changed 6 years ago by SmileyChris

  • Owner changed from nobody to SmileyChris
  • Status changed from new to assigned

Changed 6 years ago by SmileyChris

comment:6 Changed 6 years ago by SmileyChris

  • Has patch set
  • Triage Stage changed from Accepted to Ready for checkin

comment:7 Changed 6 years ago by SmileyChris

I solved the problem by not bothering with creating another regular expression at all.

The new method is to:

  1. Do a quick check to see if the arguments could be the old format (all argument bits except the last [but always the first] contain a comma)
  1. If it could be the old format, try to compile the first argument as a variable
  1. If compiling fails, assume it's the old format. Arguments are concatenated (they were a comma with a space) and then split by commas.

comment:8 Changed 6 years ago by russellm

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [12943]) Fixed #13275 -- Modified the parsing logic of the {% url %} tag to avoid catastrophic backtracking. Thanks to SmileyChris? for the patch.

comment:9 Changed 5 years ago by jacob

  • milestone 1.2 deleted

Milestone 1.2 deleted

Note: See TracTickets for help on using tickets.
Back to Top