Opened 17 years ago
Closed 14 years ago
#9202 closed Bug (wontfix)
forms.field.URLField regexp for validating URL does not follow the RFC
| Reported by: | niccl | Owned by: | nobody |
|---|---|---|---|
| Component: | Forms | Version: | 1.0 |
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Design decision needed | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
URLField validation in forms.fields checks that the supplied URL is well-formed before accepting it. The regexp used does not allow URLS of the form http://django/, where the assumption is that djanog is a host name opn a local server wihtout domain name. The relevant RFC (RFC3986) gives a regex that does allow this form of URL.
Attachments (1)
Change History (5)
by , 17 years ago
| Attachment: | field_url_re.diff added |
|---|
comment:1 by , 17 years ago
| Triage Stage: | Unreviewed → Design decision needed |
|---|
comment:2 by , 16 years ago
I don't see the problem here. If you want valid URLs, set verify_exists and let it fetch the URLs to test them.
comment:3 by , 15 years ago
| Severity: | → Normal |
|---|---|
| Type: | → Bug |
comment:4 by , 14 years ago
| Easy pickings: | unset |
|---|---|
| Resolution: | → wontfix |
| Status: | new → closed |
| UI/UX: | unset |
This regexp is found in Annex B of RFC 3986, which is called "Parsing a URI Reference with a Regular Expression". It's intended to _interpret_ any text as an URI, like the urlparse module.
On the other hand, the goal of the URLField is to _validate_ that the input has a decent chance of working if you stick it in a template like this:
<a href="{{ obj.url }}">{{ obj }}</a>
Here are some examples that are happily accepted by this regexp, but arguable aren't URLs:
>>> import re
>>> url_re = re.compile(r'^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$') # NB: I added the trailing $
>>> url_re.match('abc')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('A/B')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('?#')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('irc://irc.freenode.net/django-dev')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('\\server\share')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('this has nothing to do with an URL')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('#@!?')
<_sre.SRE_Match object at 0x1030329e0>
The regexp can be decomposed as <optional stuff>([^?#]*)<more optional stuff>, which makes it extremely laxist.
To be honest, I can't find a single example that won't match the regexp. I thought the last one would fail because of the # before the ?, but somehow it's accepted.
Finally, verify_exists is deprecated, so comment 2 no longer applies.
So this regexp, while technically correct, isn't appropriate to validate the contents of URLField; it's too permissive.
I'm not sure we want to be so permissive for
URLFieldwhen aRegexFieldwith a custom regex does the same thing. The point of theURLFieldis to validate "common" URLs; for the vast majority of usershttp://django/is a typo.