Opened 3 years ago

Closed 3 years ago

Last modified 14 months ago

#18517 closed Bug (wontfix)

URLField does not support url with underscore

Reported by: guoqiao Owned by: nobody
Component: Core (URLs) Version: 1.6
Severity: Normal Keywords: URLFiled
Cc: guoqiao Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description (last modified by claudep)

if your url has a '_' in it, like this:

the URLField will complain that it is not a valid url. the problem lies in the '_' symbol. I find in the source code as Following:

class URLValidator(RegexValidator):
    regex = re.compile(
        r'^(?:http|ftp)s?://' # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
        r'localhost|' #localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?' # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

the related part is this line:

r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'

It's clear that '_' is not included in the pattern.
I think this is not reaonable for there's a lot of url has a '_' in it.

Change History (6)

comment:1 Changed 3 years ago by guoqiao

  • Cc guoqiao added
  • Component changed from Uncategorized to Core (URLs)
  • Easy pickings set
  • Keywords URLFiled added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Type changed from Uncategorized to Bug

comment:2 Changed 3 years ago by claudep

  • Description modified (diff)

Reformatted, please use preview before posting the ticket.

comment:3 Changed 3 years ago by claudep

  • Resolution set to wontfix
  • Status changed from new to closed

The Django URLValidator is aiming to check if URLs are valid according to official rules (RFC 1034/1035) which forbid the usage of underscores in hostnames (read also http://www.quora.com/Domain-Name-System-DNS/Why-are-underscores-not-allowed-in-DNS-host-names).

If you want to relax rules for your particular usage, just subclass fields/validators and adapt them. But I don't think we should change the default validator for broken existing URLs.

About the example you provided, see https://github.com/rtfd/readthedocs.org/issues/148

comment:4 Changed 3 years ago by aaugustin

To be honest, the URLValidator doesn't aim for strict conformance to RFC. Rather, it attempts to catch typos in URLs entered manually in the admin.

Of course, it follows the RFC whenever possible, but it's also written with real-life use cases in mind.

I agree with Claude -- but please don't write tickets about edge cases where the validator diverges from the RFC :)

Last edited 3 years ago by aaugustin (previous) (diff)

comment:5 follow-up: Changed 18 months ago by slaughninja

  • Version changed from 1.4 to 1.6

If the Django URLValidator is aiming to meet the RFC1034/1035 then it should be named DomainNameValidator. The RFC for URLs would be RFC1738, which kinda, sorta allows for underscores to be used unquoted (it is a reserved character).

Also, the realitity is that there hosts out there that use underscores in their name, especially in subdomains. Even if it is not standard or recommended, hosts like that are still out there (and often outside the control of the dev) and the current behaviour renders it unusable for production.

comment:6 in reply to: ↑ 5 Changed 14 months ago by shai

Replying to slaughninja:

If the Django URLValidator is aiming to meet the RFC1034/1035 then it should be named DomainNameValidator. The RFC for URLs would be RFC1738, which kinda, sorta allows for underscores to be used unquoted (it is a reserved character).

RFC1738 may allow underscores in URLs generally, but not in the hostname.

Also, the realitity is that there hosts out there that use underscores in their name, especially in subdomains. Even if it is not standard or recommended, hosts like that are still out there (and often outside the control of the dev) and the current behaviour renders it unusable for production.

What Claude said; if you disagree, the place to argue about it is the DevelopersMailingList.

Note: See TracTickets for help on using tickets.
Back to Top