﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
34169	Regex bug in EmailValidator class allows top domain label of an email address's domain_part to start with a hyphen	Niko	nobody	"We found a possible bug with the email validation regex for the 
domain part of the email address. This regex exists in the EmailValidator class in 
the django/django/core/validators.py file. We are referencing the domain_regex variable 
in line 187.

Short description of the bug:

In short, the current domain_regex will consider an email address as valid if a hyphen is in the start of the top-domain part of the email address. However, according to the RFC 5321 documentation, putting a hyphen in the start or end of any sub-domains in the domain part of the email address is considered as an invalid email address. So to conclude, if a hyphen exists in the start of the top-domain part of the email address then Django's EmailValidator should consider that email as invalid.

Long description of the bug:

To be on the same page, we will define some nomenclature to describe the bug in more detail.

Let's use this as an example email address `xyz@a.b.com`. 
1. The ""xyz"" part is the local part of the email address and it is not of the interest in this bug report. 
2. The ""a.b.com"" is the domain part of the email address. 
3. The ""a"", ""b"" and ""com"" are the sub-domains and ""com"" is specifically a top-domain.

We will reference the section 4.1.2 of the RFC 5321 Simple Mail Transfer Protocol specification as
that document represents the ground truth representation of how email address should be structured.

The ABNF representation of the domain part of the email address is represented as something below.


`Domain         = sub-domain *(""."" sub-domain)`

`sub-domain     = Let-dig [Ldh-str]`

`Let-dig        = ALPHA / DIGIT`

`Ldh-str        = *( ALPHA / DIGIT / ""-"" ) Let-dig`


We believe the implementation of the domain checker is incorrect based on the above ABNF representation.
The `Domain` component of an email adress, as defined by the information above, is comprised of a `sub-domain` followed by 0 or more tokens containing a period ""."" and another `sub-domain`. The definition of a subdomain is a `Let-dig` followed by an `[Ldh-str]`. Note the definition of `Let-dig` is restricted to alphanumeric characters as given by `'ALPHA / DIGIT'`. Also note the definition of `[Ldh-str]` as 0 or more alphanumeric characters or hyphens, followed by a `Let-dig`. So, a `[Ldh-str]` must end with a strictly alphanumeric character.

Since a sub-domain has a strict ordering where `Let-dig` come first and then `[Ldh-str]` comes next, we can infer that the first and last character in each subdomain can only be an alphanumeric character (a-zA-z0-9). Thus, the fact that Django's email checker allows the subdomain ""-com"", for example, is a violation of RFC's specifications because of the placement of the hyphen as the first character.

Reference: https://www.rfc-editor.org/rfc/rfc5321.html#section-4.1.2"	Bug	closed	Core (Mail)	4.1	Normal	duplicate	Email, EmailValidator, core, regex	rohandeshpande832@… norton@…	Unreviewed	0	0	0	0	0	0
