Opened 11 years ago
Closed 11 years ago
#22378 closed Cleanup/optimization (fixed)
\d in urls examples in documentation is missleading.
Reported by: | tomwys | Owned by: | chris cauley |
---|---|---|---|
Component: | Documentation | Version: | 1.6 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
\d with re.UNICODE flag allows to use digits like "१". This flag is used when parsing Django urls.
For example, regexp like that (https://docs.djangoproject.com/en/dev/topics/http/urls/#example):
url(r'^articles/(\d{4})/$', 'news.views.year_archive'),
will accept "articles/११११/". This is missleading for developer. It could also be harmful. If developer will use string parsed like that in other way than parsing to int(), it could lead to error or even security issue.
Change History (8)
comment:1 by , 11 years ago
follow-up: 3 comment:2 by , 11 years ago
I think the suggestion is simply to use [0-9]
which is unambiguous even in an unicode world.
comment:3 by , 11 years ago
Yes, using [0-9]
is fine. It would be great to change all \d
to [0-9]
in documentation and tests related to urls.
ie. developer without knowledge about \d
and re.UNICODE
used in urls can expect that following code ensures that digit is written to /tmp/otuput
.
from django.conf.urls import patterns, include, url from django.contrib import admin from django.http import HttpResponse def view(request, arg): open("/tmp/output", 'w').write(arg.encode('utf-8')) return HttpResponse(arg) urlpatterns = patterns('', url(r'^(\d)/$', view), )
This is not true for localhost:8000/१/
(tested on cPython 2.7.5 and Django master). If developer assume that file (or other storage like radius, cache etc.) contains digit, but this is not true, it could cause errors (or even security issues in extreme cases) in the future.
Also, even if it is passed to int(...)
and converted to integer that way, it creates multiple valid urls for single resource, without developer knowing about it. This is minor, but unwanted behavior.
And it is not easy to discover, most of Django developers are puzzled when I talk with them about this problem.
comment:4 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Cleanup/optimization |
comment:5 by , 11 years ago
A related, but more complex issue is the matter of \w
with re.UNICODE
. I can see that some projects may wish to accept characters like è
in URLs, others may be unwittingly doing so. I believe we had this discussion recently regarding username validation.
I think a specific note about \w
and \d
in the URL documentation would be advisable. In particular, the "correct" regex for capturing a slug field should also be mentioned.
This also relates to some of my comments on https://code.djangoproject.com/ticket/22218#comment:2 regarding providing a non-regex based approach to URL resolving in core where novice developers are not required to think about these problems.
comment:6 by , 11 years ago
Hurl approach is interesting, but it is not resolving issues with re.UNICODE (but could after some modifications).
Also [0-9]+
is not solving all problems, because you can add leading zeros "0001". Correct regex would be ([1-9][0-9]*)
or ([1-9][0-9]*)|0
if you want to support 0
. But no one would like to use regexps like that in all of their urls.
Other approach would be to introduce more advanced parsing mechanism that not only validates data, but also converts it to objects and back to strings (something like to_python in models). There are several examples of such parsers that would make sense:
- integer
- float
- decimal
- date
- time
- datetime
Also some of users may want to add more advanced custom parsers like:
- date range (where first date must be earlier)
- integer range (same as above)
- GPS coordinate
- list of slugs (ie. for tags)
Advantages of this approach:
- more robust - it is guaranteed to receive assumed value without complicated regexps,
- more elastic - you can write your views and templates without assuming in what format date will be stored in url,
- removes from views repetitive tasks like parsing date from url.
Example syntax:
url(r'^articles/(\d{4})/$', 'news.views.year_archive'), # becomes: url([r'^articles/', year, '/$'], 'news.views.year_archive')), url(r'^(?P<question_id>\d+)/$', views.detail, name='detail'), # becomes: url(r'^', integer('question_id'), r'^/$'], views.detail, name='detail'), url(r'^weeklyreports/(?P<year>\d{4})/(?P<month>\d{2})/(?P<day>\d+)/$', 'weeklyreports', name='weeklyreports'), # becomes: url([r'^weeklyreports/', date('report_date, 'YYYY/MM/DD'), r'/$'], 'weeklyreports', name='weeklyreports'),
This could be combined with Hurl approach.
comment:7 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:8 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Could you elaborate a bit on the problem and what change you think we should make? When I try your example with something like
articles/१/
, the value passed to the view is "1". If there are security implications you should report them privately per https://docs.djangoproject.com/en/1.6/internals/security/#reporting-security-issues. Thanks.