Code

Opened 22 months ago

Closed 22 months ago

Last modified 3 months ago

#18517 closed Bug (wontfix)

URLField does not support url with underscore

Reported by: guoqiao Owned by: nobody
Component: Core (URLs) Version: 1.6
Severity: Normal Keywords: URLFiled
Cc: guoqiao Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description (last modified by claudep)

if your url has a '_' in it, like this:

the URLField will complain that it is not a valid url. the problem lies in the '_' symbol. I find in the source code as Following:

class URLValidator(RegexValidator):
    regex = re.compile(
        r'^(?:http|ftp)s?://' # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
        r'localhost|' #localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?' # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

the related part is this line:

r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'

It's clear that '_' is not included in the pattern.
I think this is not reaonable for there's a lot of url has a '_' in it.

Attachments (0)

Change History (5)

comment:1 Changed 22 months ago by guoqiao

  • Cc guoqiao added
  • Component changed from Uncategorized to Core (URLs)
  • Easy pickings set
  • Keywords URLFiled added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Type changed from Uncategorized to Bug

comment:2 Changed 22 months ago by claudep

  • Description modified (diff)

Reformatted, please use preview before posting the ticket.

comment:3 Changed 22 months ago by claudep

  • Resolution set to wontfix
  • Status changed from new to closed

The Django URLValidator is aiming to check if URLs are valid according to official rules (RFC 1034/1035) which forbid the usage of underscores in hostnames (read also http://www.quora.com/Domain-Name-System-DNS/Why-are-underscores-not-allowed-in-DNS-host-names).

If you want to relax rules for your particular usage, just subclass fields/validators and adapt them. But I don't think we should change the default validator for broken existing URLs.

About the example you provided, see https://github.com/rtfd/readthedocs.org/issues/148

comment:4 Changed 22 months ago by aaugustin

To be honest, the URLValidator doesn't aim for strict conformance to RFC. Rather, it attempts to catch typos in URLs entered manually in the admin.

Of course, it follows the RFC whenever possible, but it's also written with real-life use cases in mind.

I agree with Claude -- but please don't write tickets about edge cases where the validator diverges from the RFC :)

Last edited 22 months ago by aaugustin (previous) (diff)

comment:5 Changed 3 months ago by slaughninja

  • Version changed from 1.4 to 1.6

If the Django URLValidator is aiming to meet the RFC1034/1035 then it should be named DomainNameValidator. The RFC for URLs would be RFC1738, which kinda, sorta allows for underscores to be used unquoted (it is a reserved character).

Also, the realitity is that there hosts out there that use underscores in their name, especially in subdomains. Even if it is not standard or recommended, hosts like that are still out there (and often outside the control of the dev) and the current behaviour renders it unusable for production.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.