Opened 11 years ago

Closed 9 years ago

#20264 closed Bug (invalid)

URLValidator should allow underscores in local hostname

Reported by: Arthur Debert Owned by: nobody
Component: Core (Other) Version: dev
Severity: Normal Keywords:
Cc: bmispelon@…, Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Underscores are valid in local hostnames, as per RFC 1178.

The current validator is too strict. Local hostname can include (and also start) with an underscore.

Change History (10)

comment:1 by Arthur Debert, 11 years ago

Has patch: set

comment:2 by Baptiste Mispelon, 11 years ago

Cc: bmispelon@… added
Resolution: invalid
Status: newclosed

Hi,

RFC 1178 says "Don't use non-alphanumeric characters in a name" and according to its first paragraph, it's merely a set of guidelines.

For more technical details, the wikipedia article on hostnames has some good references: https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names

The valid characters according to RFC 952 are [a-z0-9-], which is what django uses.

comment:3 by Arthur Debert, 11 years ago

Hi

I've ran into this from such URLs we are seening in the wild. A great number of tools ( popular browsers, curl, wget , BIND) will allow for such addresses (underscore in the local part). I wonder if a de-facto standard could be in place here?
This stackoverflow answer gives a different reading from RFC 2181 http://stackoverflow.com/a/2183140/65490 .

Thank you

comment:4 by Baptiste Mispelon, 11 years ago

Resolution: invalid
Status: closednew

My apologies, it seems I got confused between host names and domain names, which have different sets of allowed characters (the domain names do allow underscores, as noted in the wikipedia article that I linked to).

I'm reopening this but leaving it as Unreviewed for now.

You seem to have a good case that the current regex is incorrect, but I want to double-check.

Thanks.

comment:5 by Florian Apolloner, 11 years ago

Resolution: invalid
Status: newclosed

An underscore is not valid as per RFC-952 (we are talking hostname in URLs and not DNS records, I am aware that DNS records like srv require/allow underscores). RFC-1123 section 2.1 later on relaxed that but still didn't allow an underscore. RFC-2821 increased the length, but again, no adding of underscore.

To answer:

I've ran into this from such URLs we are seening in the wild. A great number of tools ( popular browsers, curl, wget , BIND) will allow for such addresses (underscore in the local part).

Browsers do whatever they think is best, from their point of view it obviously makes no sense to disallow underscores. BIND obviously has no reason to disallow it either since it's valid for DNS records (yes, most likely even for A-records, which still doesn't make them valid with regard to the HTTP rfc + related ones)

comment:6 by anonymous, 10 years ago

FWIW

per http://networkadminkb.com/KB/a156/windows-2003-dns-and-the-underscore.aspx

"Is the underscore a supported character for DNS hostnames (my_hostname.domain.com)?
Yes, RFC 2181 added support for the underscore and other non-English characters. Prior to RFC 2181, RFC 1035 explicitly limited the character for use in hostnames to English letters, numbers, and a hyphen."

comment:7 by Simon Charette, 10 years ago

Cc: Simon Charette added

comment:8 by anonymous, 10 years ago

Did you read RFC 2181? That link appears to be opportunistic MSFT interpretation of RFC 2181 to address NETBIOS naming. See http://technet.microsoft.com/en-us/library/cc959336.aspx for more of that justification. RFC 2181 doesn't specifically say anything about changing what is valid in a hostname it just talks about what can be in a DNS entry and completely punts on whether clients will or should accept it. Compare that to RFC 1123 which is very specific on its subject and it doesn't look to me like 2181 is reasonably interpreted as overriding 1123 in general. OTOH 1123 is completely clear that it updates 952.

comment:9 by Hasan Alayli, 9 years ago

Has patch: unset
Resolution: invalid
Status: closednew

Since all major browsers support underscores, and DNS allows underscore domains, it's bound that when URLValidator is used to validate a user's input it will fail unexpectedly from the user's perspective.

Strictly abiding by the RFC with complete disregard to pragmatism makes no sense. The least this highly opinionated class can offer is a strict flag.

comment:10 by Tim Graham, 9 years ago

Resolution: invalid
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top