#36452 closed Bug (invalid)
DomainNameValidator forbids digits in TLDs
Reported by: | Shai Berger | Owned by: | |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Normal | Keywords: | validation domain |
Cc: | Shai Berger, Claude Paroz, Mike Edmunds | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
I think there's a small bug in the domain validator, that has been lurking quietly for years, and is now biting me a little. The issue is digits in top-level-domains -- e.g. email.com1
. As far as I can read the definition in RFC 1035 (page 8), this is a perfectly valid domain name, but our regex, as I write this, allows only letters. This is the regex for i18n-supporting domains; there's an "ascii_only_tld" regex right next to it, which does allow digits -- this makes me quite certain that it's a bug.
Of note: The class DomainNameValidator
is relatively new - only added about a year ago -- but it inherits the regex from older URLValidator
, which, it seems, has forbidden digits in TLDs at least since Django 2.x. Since EmailValidator
now also uses the regexes from DomainNameValidator
, it is also affected.
Change History (3)
comment:1 by , 3 months ago
follow-up: 3 comment:2 by , 3 months ago
Cc: | added |
---|---|
Resolution: | → needsinfo |
Status: | new → closed |
I was trying to check if email.com1
should be valid.
Looking at this list of top level domains (https://www.icann.org/en/contracted-parties/registry-operators/resources/list-of-top-level-domains), the only numeric top level domains are prefixed with XN--
. This is allowed by our validator.
I think I agree that looking at the RFC, the definition isn't this strict and digits are allowed without hyphens:
<domain> ::= <subdomain> | " " <subdomain> ::= <label> | <subdomain> "." <label> <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str> <let-dig-hyp> ::= <let-dig> | "-" <let-dig> ::= <letter> | <digit> <letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case <digit> ::= any one of the ten digits 0 through 9
Before we continue, I think we should get confirmation that tlds like com1
are valid
comment:3 by , 3 months ago
Resolution: | needsinfo → invalid |
---|
Replying to Sarah Boyce:
Before we continue, I think we should get confirmation that tlds like
com1
are valid
I believe com1
is not a valid TLD under current ICANN rules. Since ICANN decides what's a valid gTLD, their policies override whatever RFC 1035 may seem to allow.
There's a pretty thorough review here: https://stackoverflow.com/questions/9071279/number-in-the-top-level-domain/53875771.
That said, I haven't personally reviewed RFC 1035 and all 29(!) RFCs that modify it. The ICANN gTLD policies are from 2012; there's a new gTLD policy in draft form now, and I haven't reviewed that either. So if someone finds a newer policy that would allow digits in TLDs—or better yet, real-world evidence of a (non-IDNA) TLD containing digits—then we should revisit this.
The exception, as Sarah noted, is an IDNA-encoded TLD starting with xn--
. ICANN allows those, and so does Django's DomainNameValidator.
(Also, I suppose there could be internal-use-only TLDs containing digits, which might be valid under the RFCs but wouldn't be usable on the public Internet. That seems pretty niche, and anyone having that use case could subclass Django's DomainNameValidator to cover it.)
I suppose technically it's incorrect however I didn't see any registered TLDs with digits and the folks that were involved with the recent update were hesitant to touch the existing regex for fear of breaking something.
I'm just wondering "what would Carlton decide here" lmao