Opened 16 years ago
Closed 13 years ago
#9202 closed Bug (wontfix)
forms.field.URLField regexp for validating URL does not follow the RFC
Reported by: | niccl | Owned by: | nobody |
---|---|---|---|
Component: | Forms | Version: | 1.0 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Design decision needed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
URLField validation in forms.fields checks that the supplied URL is well-formed before accepting it. The regexp used does not allow URLS of the form http://django/, where the assumption is that djanog is a host name opn a local server wihtout domain name. The relevant RFC (RFC3986) gives a regex that does allow this form of URL.
Attachments (1)
Change History (5)
by , 16 years ago
Attachment: | field_url_re.diff added |
---|
comment:1 by , 16 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:2 by , 15 years ago
I don't see the problem here. If you want valid URLs, set verify_exists and let it fetch the URLs to test them.
comment:3 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:4 by , 13 years ago
Easy pickings: | unset |
---|---|
Resolution: | → wontfix |
Status: | new → closed |
UI/UX: | unset |
This regexp is found in Annex B of RFC 3986, which is called "Parsing a URI Reference with a Regular Expression". It's intended to _interpret_ any text as an URI, like the urlparse
module.
On the other hand, the goal of the URLField is to _validate_ that the input has a decent chance of working if you stick it in a template like this:
<a href="{{ obj.url }}">{{ obj }}</a>
Here are some examples that are happily accepted by this regexp, but arguable aren't URLs:
>>> import re >>> url_re = re.compile(r'^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$') # NB: I added the trailing $ >>> url_re.match('abc') <_sre.SRE_Match object at 0x1030329e0> >>> url_re.match('A/B') <_sre.SRE_Match object at 0x103032ad8> >>> url_re.match('?#') <_sre.SRE_Match object at 0x1030329e0> >>> url_re.match('irc://irc.freenode.net/django-dev') <_sre.SRE_Match object at 0x103032ad8> >>> url_re.match('\\server\share') <_sre.SRE_Match object at 0x1030329e0> >>> url_re.match('this has nothing to do with an URL') <_sre.SRE_Match object at 0x103032ad8> >>> url_re.match('') <_sre.SRE_Match object at 0x103032ad8> >>> url_re.match('#@!?') <_sre.SRE_Match object at 0x1030329e0>
The regexp can be decomposed as <optional stuff>([^?#]*)<more optional stuff>
, which makes it extremely laxist.
To be honest, I can't find a single example that won't match the regexp. I thought the last one would fail because of the #
before the ?
, but somehow it's accepted.
Finally, verify_exists
is deprecated, so comment 2 no longer applies.
So this regexp, while technically correct, isn't appropriate to validate the contents of URLField; it's too permissive.
I'm not sure we want to be so permissive for
URLField
when aRegexField
with a custom regex does the same thing. The point of theURLField
is to validate "common" URLs; for the vast majority of usershttp://django/
is a typo.