#13275 closed (fixed)
url_backwards_re catastrophic backtracking (?) problem
Reported by: | Karen Tracey | Owned by: | Chris Beaven |
---|---|---|---|
Component: | Template system | Version: | dev |
Severity: | Keywords: | ||
Cc: | Brian Neal, Łukasz Rekucki, perrygeo | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
As noted here: http://code.djangoproject.com/ticket/12945#comment:20 the new url_backwards_re appears to hang in some circumstances. When we've seen this before it has been due to catastrophic backtracking, so I'm guessing that's the cause here, though I can't say with complete certainty that that is the the issue.
Attachments (2)
Change History (11)
Changed 14 years ago by
Attachment: | t13275.tests.diff added |
---|
Tests for some regressions introduced by r12889
comment:2 Changed 14 years ago by
Cc: | Brian Neal added |
---|
comment:3 Changed 14 years ago by
Cc: | Łukasz Rekucki added |
---|
The core of this problem is:
re.match(r'''(([^,]+)=?)+$''', "any.long.name.with.comma.at.the.end,")
Inner group matches everything up to the comma. Then it fails and the backtracking starts. I played a bit and came up with something like this (positive match instead of negative).
KWARG_OLD_STYLE = r"""(?:(?:\w+=)?(?:'[^']*'|"[^"]*"|[^,\s]+))""" re.match("^(%s,)*%s$" % (KWARG_OLD_STYLE, KWARG_OLD_STYLE), "citation_key=reference.citation_key,database=database")
Then you're left only with problem of spliting arguments with possible quoted commas, like 'citiation="Jon, Doe"'.
I don't want to sound like a critic, but this could be solved with a simple 3 state parser that could be reused in other tags also. As it would have to be written in pure python, it could be slower in average case then regular expressions. If anyone thinks this is a good idea, I can write a patch for this.
comment:4 Changed 14 years ago by
Cc: | perrygeo added |
---|
Confirming that this url_backwards_re hangup is a huge problem - seeing slowdowns of 2 orders of magnitude on many of our templates containing old-style (comma-seperated) url arguments.
comment:5 Changed 14 years ago by
Owner: | changed from nobody to Chris Beaven |
---|---|
Status: | new → assigned |
Changed 14 years ago by
Attachment: | 13275.diff added |
---|
comment:6 Changed 14 years ago by
Has patch: | set |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:7 Changed 14 years ago by
I solved the problem by not bothering with creating another regular expression at all.
The new method is to:
- Do a quick check to see if the arguments could be the old format (all argument bits except the last [but always the first] contain a comma)
- If it could be the old format, try to compile the first argument as a variable
- If compiling fails, assume it's the old format. Arguments are concatenated (they were a comma with a space) and then split by commas.
comment:8 Changed 14 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |