Opened 6 years ago

Closed 4 years ago

#28936 closed Bug (invalid)

simplify_regex should remove redundant escape sequences outside groups

Reported by: Cristi Vîjdea Owned by:
Component: contrib.admindocs Version: 2.0
Severity: Normal Keywords: simplify_regex path
Cc: ChillarAnand, Carlton Gibson Triage Stage: Ready for checkin
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: yes

Description (last modified by Cristi Vîjdea)

django.contrib.admindocs.views.simplify_urls should clean up escapes found outside path parameters. Otherwise, broken URLs with backslashes can be generated and displayed.

This is readily apparent with Django 2's path(), which aggresively escapes everything outside a <parameter> specifier, resulting in a urlpattern with backslash-escaped forward slashes:

>>> simplify_regex(r"^(?P<sport_slug>\w+)/athletes/(?P<athlete_slug>\w+)/$")
'/<sport_slug>/athletes/<athlete_slug>/'
>>> simplify_regex(r"^(?P<sport_slug>\w+)\/athletes\/(?P<athlete_slug>\w+)\/$")
'/<sport_slug>\\/athletes\\/<athlete_slug>\\/'

The second example is what path() would generate in urlpatterns.

You can, for example, see this issue affecting django-rest-framework here.

Change History (8)

comment:1 by Cristi Vîjdea, 6 years ago

Description: modified (diff)

comment:2 by ChillarAnand, 6 years ago

Cc: ChillarAnand added
Triage Stage: UnreviewedAccepted

comment:3 by Oliver Cleary, 5 years ago

Owner: changed from nobody to Oliver Cleary
Status: newassigned

comment:4 by Oliver Cleary, 5 years ago

I have a PR for this ticket, however I am not sure it really needs to be fixed.

The referenced DRF ticket was resolved by changing the usage of the simplify_regex function to match the usage by Django, which is to pass in the paths route string directly, rather than using it's generated regex pattern.

Additionally, the given example does not seem to be correct as path does not escape forward slashes.

>>> path('<slug:sports_slug>/athletes/<slug:athletes_slug>/', lambda: None).pattern.regex.pattern
'^(?P<sports_slug>[-a-zA-Z0-9_]+)/athletes/(?P<athletes_slug>[-a-zA-Z0-9_]+)/$'

Testing the example in the DRF ticket however does exhibit the issue:

>>> path('^api/token-auth/', lambda: None).pattern.regex.pattern
'^\\^api/token\\-auth/$'
>>> simplify_regex(r'^\^api/token\-auth/$')
'/\\api/token\\-auth/'

With the fix in the PR the special characters are unescaped, and the ^?$ are only stripped if not escaped.

PR https://github.com/django/django/pull/11352

Version 1, edited 5 years ago by Oliver Cleary (previous) (next) (diff)

comment:5 by Oliver Cleary, 4 years ago

Owner: Oliver Cleary removed
Status: assignednew

comment:6 by Mariusz Felisiak, 4 years ago

Cc: Carlton Gibson added

comment:7 by Johannes Maron, 4 years ago

Triage Stage: AcceptedReady for checkin

comment:8 by Carlton Gibson, 4 years ago

Resolution: invalid
Status: newclosed

Thanks for the patch Oliver.

... I am not sure it really needs to be fixed.

I think this is right.

As with the DRF issue, calling str() on path.pattern before passing to simplify_regex() is the correct usage, and it is what `admindocs` does.

Here using the original example from the DRF issue:

>>> from django.urls import path
>>> from django.contrib.admindocs.views import simplify_regex
>>> p = path('^api/token-auth/', lambda: None)  # Not sure about the `^`, but either way…
>>> p.pattern
<django.urls.resolvers.RoutePattern object at 0x10acd49d0>
>>> p.pattern.regex.pattern
'^\\^api/token\\-auth/$'
>>> simplify_regex(p.pattern.regex.pattern)  # Wrong usage. 
'/\\api/token\\-auth/'
>>> str(p.pattern)
'^api/token-auth/'
>>> simplify_regex(str(p.pattern))  # Correct usage. 
'/api/token-auth/'
Note: See TracTickets for help on using tickets.
Back to Top