#13275 closed (fixed)
url_backwards_re catastrophic backtracking (?) problem
Reported by: | Karen Tracey | Owned by: | Chris Beaven |
---|---|---|---|
Component: | Template system | Version: | dev |
Severity: | Keywords: | ||
Cc: | Brian Neal, Łukasz Rekucki, perrygeo | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
As noted here: http://code.djangoproject.com/ticket/12945#comment:20 the new url_backwards_re appears to hang in some circumstances. When we've seen this before it has been due to catastrophic backtracking, so I'm guessing that's the cause here, though I can't say with complete certainty that that is the the issue.
Attachments (2)
Change History (11)
by , 15 years ago
Attachment: | t13275.tests.diff added |
---|
Tests for some regressions introduced by r12889
comment:2 by , 15 years ago
Cc: | added |
---|
comment:3 by , 15 years ago
Cc: | added |
---|
The core of this problem is:
re.match(r'''(([^,]+)=?)+$''', "any.long.name.with.comma.at.the.end,")
Inner group matches everything up to the comma. Then it fails and the backtracking starts. I played a bit and came up with something like this (positive match instead of negative).
KWARG_OLD_STYLE = r"""(?:(?:\w+=)?(?:'[^']*'|"[^"]*"|[^,\s]+))""" re.match("^(%s,)*%s$" % (KWARG_OLD_STYLE, KWARG_OLD_STYLE), "citation_key=reference.citation_key,database=database")
Then you're left only with problem of spliting arguments with possible quoted commas, like 'citiation="Jon, Doe"'.
I don't want to sound like a critic, but this could be solved with a simple 3 state parser that could be reused in other tags also. As it would have to be written in pure python, it could be slower in average case then regular expressions. If anyone thinks this is a good idea, I can write a patch for this.
comment:4 by , 15 years ago
Cc: | added |
---|
Confirming that this url_backwards_re hangup is a huge problem - seeing slowdowns of 2 orders of magnitude on many of our templates containing old-style (comma-seperated) url arguments.
comment:5 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
by , 15 years ago
Attachment: | 13275.diff added |
---|
comment:6 by , 15 years ago
Has patch: | set |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:7 by , 15 years ago
I solved the problem by not bothering with creating another regular expression at all.
The new method is to:
- Do a quick check to see if the arguments could be the old format (all argument bits except the last [but always the first] contain a comma)
- If it could be the old format, try to compile the first argument as a variable
- If compiling fails, assume it's the old format. Arguments are concatenated (they were a comma with a space) and then split by commas.
comment:8 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |