Opened 7 years ago

Closed 4 years ago

#9202 closed Bug (wontfix)

forms.field.URLField regexp for validating URL does not follow the RFC

Reported by: niccl Owned by: nobody
Component: Forms Version: 1.0
Severity: Normal Keywords:
Cc: Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


URLField validation in forms.fields checks that the supplied URL is well-formed before accepting it. The regexp used does not allow URLS of the form http://django/, where the assumption is that djanog is a host name opn a local server wihtout domain name. The relevant RFC (RFC3986) gives a regex that does allow this form of URL.

Attachments (1)

field_url_re.diff (642 bytes) - added by niccl 7 years ago.

Download all attachments as: .zip

Change History (5)

Changed 7 years ago by niccl

comment:1 Changed 7 years ago by jacob

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

I'm not sure we want to be so permissive for URLField when a RegexField with a custom regex does the same thing. The point of the URLField is to validate "common" URLs; for the vast majority of users http://django/ is a typo.

comment:2 Changed 6 years ago by rlaager

I don't see the problem here. If you want valid URLs, set verify_exists and let it fetch the URLs to test them.

comment:3 Changed 4 years ago by lukeplant

  • Severity set to Normal
  • Type set to Bug

comment:4 Changed 4 years ago by aaugustin

  • Easy pickings unset
  • Resolution set to wontfix
  • Status changed from new to closed
  • UI/UX unset

This regexp is found in Annex B of RFC 3986, which is called "Parsing a URI Reference with a Regular Expression". It's intended to _interpret_ any text as an URI, like the urlparse module.

On the other hand, the goal of the URLField is to _validate_ that the input has a decent chance of working if you stick it in a template like this:

<a href="{{ obj.url }}">{{ obj }}</a>

Here are some examples that are happily accepted by this regexp, but arguable aren't URLs:

>>> import re
>>> url_re = re.compile(r'^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$')     # NB: I added the trailing $
>>> url_re.match('abc')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('A/B')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('?#')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('irc://')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('\\server\share')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('this has nothing to do with an URL')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('#@!?')
<_sre.SRE_Match object at 0x1030329e0>

The regexp can be decomposed as <optional stuff>([^?#]*)<more optional stuff>, which makes it extremely laxist.

To be honest, I can't find a single example that won't match the regexp. I thought the last one would fail because of the # before the ?, but somehow it's accepted.

Finally, verify_exists is deprecated, so comment 2 no longer applies.

So this regexp, while technically correct, isn't appropriate to validate the contents of URLField; it's too permissive.

Note: See TracTickets for help on using tickets.
Back to Top