URLField validation in forms.fields checks that the supplied URL is well-formed before accepting it. The regexp used does not allow URLS of the form http://django/, where the assumption is that djanog is a host name opn a local server wihtout domain name. The relevant RFC (RFC3986) gives a regex that does allow this form of URL.

comment:1 Changed 8 years ago by Jacob

I'm not sure we want to be so permissive for URLField when a RegexField with a custom regex does the same thing. The point of the URLField is to validate "common" URLs; for the vast majority of users http://django/ is a typo.

comment:2 Changed 7 years ago by rlaager

I don't see the problem here. If you want valid URLs, set verify_exists and let it fetch the URLs to test them.

comment:3 Changed 6 years ago by Luke Plant

comment:4 Changed 5 years ago by Aymeric Augustin

This regexp is found in Annex B of RFC 3986, which is called "Parsing a URI Reference with a Regular Expression". It's intended to _interpret_ any text as an URI, like the urlparse module.

On the other hand, the goal of the URLField is to _validate_ that the input has a decent chance of working if you stick it in a template like this:

<a href="{{ obj.url }}">{{ obj }}</a>

Here are some examples that are happily accepted by this regexp, but arguable aren't URLs:

>>> import re
>>> url_re = re.compile(r'^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$')     # NB: I added the trailing $
>>> url_re.match('abc')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('A/B')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('?#')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('irc://')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('\\server\share')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('this has nothing to do with an URL')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('#@!?')
<_sre.SRE_Match object at 0x1030329e0>

The regexp can be decomposed as <optional stuff>([^?#]*)<more optional stuff>, which makes it extremely laxist.

To be honest, I can't find a single example that won't match the regexp. I thought the last one would fail because of the # before the ?, but somehow it's accepted.

Finally, verify_exists is deprecated, so comment 2 no longer applies.

So this regexp, while technically correct, isn't appropriate to validate the contents of URLField; it's too permissive.

