Opened 14 years ago

Closed 14 years ago

Last modified 13 years ago

#13275 closed (fixed)

url_backwards_re catastrophic backtracking (?) problem

Reported by: Karen Tracey Owned by: Chris Beaven
Component: Template system Version: dev
Severity: Keywords:
Cc: Brian Neal, Łukasz Rekucki, perrygeo Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

As noted here: http://code.djangoproject.com/ticket/12945#comment:20 the new url_backwards_re appears to hang in some circumstances. When we've seen this before it has been due to catastrophic backtracking, so I'm guessing that's the cause here, though I can't say with complete certainty that that is the the issue.

Attachments (2)

t13275.tests.diff (4.5 KB ) - added by Russell Keith-Magee 14 years ago.
Tests for some regressions introduced by r12889
13275.diff (7.1 KB ) - added by Chris Beaven 14 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 by Russell Keith-Magee, 14 years ago

Triage Stage: UnreviewedAccepted
import re
re.match(r'''(('[^']*'|"[^"]*"|[^,]+)=?)+$''',
"citation_key=reference.citation_key,database=database")

by Russell Keith-Magee, 14 years ago

Attachment: t13275.tests.diff added

Tests for some regressions introduced by r12889

comment:2 by Brian Neal, 14 years ago

Cc: Brian Neal added

comment:3 by Łukasz Rekucki, 14 years ago

Cc: Łukasz Rekucki added

The core of this problem is:

re.match(r'''(([^,]+)=?)+$''', "any.long.name.with.comma.at.the.end,")

Inner group matches everything up to the comma. Then it fails and the backtracking starts. I played a bit and came up with something like this (positive match instead of negative).

KWARG_OLD_STYLE = r"""(?:(?:\w+=)?(?:'[^']*'|"[^"]*"|[^,\s]+))""" 
re.match("^(%s,)*%s$" % (KWARG_OLD_STYLE, KWARG_OLD_STYLE), "citation_key=reference.citation_key,database=database")

Then you're left only with problem of spliting arguments with possible quoted commas, like 'citiation="Jon, Doe"'.

I don't want to sound like a critic, but this could be solved with a simple 3 state parser that could be reused in other tags also. As it would have to be written in pure python, it could be slower in average case then regular expressions. If anyone thinks this is a good idea, I can write a patch for this.

comment:4 by perrygeo, 14 years ago

Cc: perrygeo added

Confirming that this url_backwards_re hangup is a huge problem - seeing slowdowns of 2 orders of magnitude on many of our templates containing old-style (comma-seperated) url arguments.

comment:5 by Chris Beaven, 14 years ago

Owner: changed from nobody to Chris Beaven
Status: newassigned

by Chris Beaven, 14 years ago

Attachment: 13275.diff added

comment:6 by Chris Beaven, 14 years ago

Has patch: set
Triage Stage: AcceptedReady for checkin

comment:7 by Chris Beaven, 14 years ago

I solved the problem by not bothering with creating another regular expression at all.

The new method is to:

  1. Do a quick check to see if the arguments could be the old format (all argument bits except the last [but always the first] contain a comma)
  1. If it could be the old format, try to compile the first argument as a variable
  1. If compiling fails, assume it's the old format. Arguments are concatenated (they were a comma with a space) and then split by commas.

comment:8 by Russell Keith-Magee, 14 years ago

Resolution: fixed
Status: assignedclosed

(In [12943]) Fixed #13275 -- Modified the parsing logic of the {% url %} tag to avoid catastrophic backtracking. Thanks to SmileyChris? for the patch.

comment:9 by Jacob, 13 years ago

milestone: 1.2

Milestone 1.2 deleted

Note: See TracTickets for help on using tickets.
Back to Top