Opened 23 months ago

Closed 23 months ago

Last modified 16 months ago

#20264 closed Bug (invalid)

URLValidator should allow underscores in local hostname

Reported by: arthurdebert Owned by: nobody
Component: Core (Other) Version: master
Severity: Normal Keywords:
Cc: bmispelon@…, charettes Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Underscores are valid in local hostnames, as per RFC 1178.

The current validator is too strict. Local hostname can include (and also start) with an underscore.

Change History (8)

comment:1 Changed 23 months ago by arthurdebert

  • Has patch set
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 23 months ago by bmispelon

  • Cc bmispelon@… added
  • Resolution set to invalid
  • Status changed from new to closed

Hi,

RFC 1178 says "Don't use non-alphanumeric characters in a name" and according to its first paragraph, it's merely a set of guidelines.

For more technical details, the wikipedia article on hostnames has some good references: https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names

The valid characters according to RFC 952 are [a-z0-9-], which is what django uses.

comment:3 Changed 23 months ago by arthurdebert

Hi

I've ran into this from such URLs we are seening in the wild. A great number of tools ( popular browsers, curl, wget , BIND) will allow for such addresses (underscore in the local part). I wonder if a de-facto standard could be in place here?
This stackoverflow answer gives a different reading from RFC 2181 http://stackoverflow.com/a/2183140/65490 .

Thank you

comment:4 Changed 23 months ago by bmispelon

  • Resolution invalid deleted
  • Status changed from closed to new

My apologies, it seems I got confused between host names and domain names, which have different sets of allowed characters (the domain names do allow underscores, as noted in the wikipedia article that I linked to).

I'm reopening this but leaving it as Unreviewed for now.

You seem to have a good case that the current regex is incorrect, but I want to double-check.

Thanks.

comment:5 Changed 23 months ago by apollo13

  • Resolution set to invalid
  • Status changed from new to closed

An underscore is not valid as per RFC-952 (we are talking hostname in URLs and not DNS records, I am aware that DNS records like srv require/allow underscores). RFC-1123 section 2.1 later on relaxed that but still didn't allow an underscore. RFC-2821 increased the length, but again, no adding of underscore.

To answer:

I've ran into this from such URLs we are seening in the wild. A great number of tools ( popular browsers, curl, wget , BIND) will allow for such addresses (underscore in the local part).

Browsers do whatever they think is best, from their point of view it obviously makes no sense to disallow underscores. BIND obviously has no reason to disallow it either since it's valid for DNS records (yes, most likely even for A-records, which still doesn't make them valid with regard to the HTTP rfc + related ones)

comment:6 Changed 16 months ago by anonymous

FWIW

per http://networkadminkb.com/KB/a156/windows-2003-dns-and-the-underscore.aspx

"Is the underscore a supported character for DNS hostnames (my_hostname.domain.com)?
Yes, RFC 2181 added support for the underscore and other non-English characters. Prior to RFC 2181, RFC 1035 explicitly limited the character for use in hostnames to English letters, numbers, and a hyphen."

comment:7 Changed 16 months ago by charettes

  • Cc charettes added

comment:8 Changed 16 months ago by anonymous

Did you read RFC 2181? That link appears to be opportunistic MSFT interpretation of RFC 2181 to address NETBIOS naming. See http://technet.microsoft.com/en-us/library/cc959336.aspx for more of that justification. RFC 2181 doesn't specifically say anything about changing what is valid in a hostname it just talks about what can be in a DNS entry and completely punts on whether clients will or should accept it. Compare that to RFC 1123 which is very specific on its subject and it doesn't look to me like 2181 is reasonably interpreted as overriding 1123 in general. OTOH 1123 is completely clear that it updates 952.

Note: See TracTickets for help on using tickets.
Back to Top