Opened 7 years ago

Closed 7 years ago

Last modified 5 years ago

#13275 closed (fixed)

url_backwards_re catastrophic backtracking (?) problem

Reported by: Karen Tracey Owned by: Chris Beaven
Component: Template system Version: master
Severity: Keywords:
Cc: Brian Neal, Łukasz Rekucki, perrygeo Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

As noted here: http://code.djangoproject.com/ticket/12945#comment:20 the new url_backwards_re appears to hang in some circumstances. When we've seen this before it has been due to catastrophic backtracking, so I'm guessing that's the cause here, though I can't say with complete certainty that that is the the issue.

Attachments (2)

t13275.tests.diff (4.5 KB) - added by Russell Keith-Magee 7 years ago.
Tests for some regressions introduced by r12889
13275.diff (7.1 KB) - added by Chris Beaven 7 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 7 years ago by Russell Keith-Magee

Triage Stage: UnreviewedAccepted
import re
re.match(r'''(('[^']*'|"[^"]*"|[^,]+)=?)+$''',
"citation_key=reference.citation_key,database=database")

Changed 7 years ago by Russell Keith-Magee

Attachment: t13275.tests.diff added

Tests for some regressions introduced by r12889

comment:2 Changed 7 years ago by Brian Neal

Cc: Brian Neal added

comment:3 Changed 7 years ago by Łukasz Rekucki

Cc: Łukasz Rekucki added

The core of this problem is:

re.match(r'''(([^,]+)=?)+$''', "any.long.name.with.comma.at.the.end,")

Inner group matches everything up to the comma. Then it fails and the backtracking starts. I played a bit and came up with something like this (positive match instead of negative).

KWARG_OLD_STYLE = r"""(?:(?:\w+=)?(?:'[^']*'|"[^"]*"|[^,\s]+))""" 
re.match("^(%s,)*%s$" % (KWARG_OLD_STYLE, KWARG_OLD_STYLE), "citation_key=reference.citation_key,database=database")

Then you're left only with problem of spliting arguments with possible quoted commas, like 'citiation="Jon, Doe"'.

I don't want to sound like a critic, but this could be solved with a simple 3 state parser that could be reused in other tags also. As it would have to be written in pure python, it could be slower in average case then regular expressions. If anyone thinks this is a good idea, I can write a patch for this.

comment:4 Changed 7 years ago by perrygeo

Cc: perrygeo added

Confirming that this url_backwards_re hangup is a huge problem - seeing slowdowns of 2 orders of magnitude on many of our templates containing old-style (comma-seperated) url arguments.

comment:5 Changed 7 years ago by Chris Beaven

Owner: changed from nobody to Chris Beaven
Status: newassigned

Changed 7 years ago by Chris Beaven

Attachment: 13275.diff added

comment:6 Changed 7 years ago by Chris Beaven

Has patch: set
Triage Stage: AcceptedReady for checkin

comment:7 Changed 7 years ago by Chris Beaven

I solved the problem by not bothering with creating another regular expression at all.

The new method is to:

  1. Do a quick check to see if the arguments could be the old format (all argument bits except the last [but always the first] contain a comma)
  1. If it could be the old format, try to compile the first argument as a variable
  1. If compiling fails, assume it's the old format. Arguments are concatenated (they were a comma with a space) and then split by commas.

comment:8 Changed 7 years ago by Russell Keith-Magee

Resolution: fixed
Status: assignedclosed

(In [12943]) Fixed #13275 -- Modified the parsing logic of the {% url %} tag to avoid catastrophic backtracking. Thanks to SmileyChris? for the patch.

comment:9 Changed 5 years ago by Jacob

milestone: 1.2

Milestone 1.2 deleted

Note: See TracTickets for help on using tickets.
Back to Top