#34169 closed Bug (duplicate)

Regex bug in EmailValidator class allows top domain label of an email address's domain_part to start with a hyphen

Reported by: Niko Owned by: nobody
Component: Core (Mail) Version: 4.1
Severity: Normal Keywords: Email, EmailValidator, core, regex
Cc: rohandeshpande832@…, norton@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

We found a possible bug with the email validation regex for the
domain part of the email address. This regex exists in the EmailValidator class in
the django/django/core/validators.py file. We are referencing the domain_regex variable
in line 187.

Short description of the bug:

In short, the current domain_regex will consider an email address as valid if a hyphen is in the start of the top-domain part of the email address. However, according to the RFC 5321 documentation, putting a hyphen in the start or end of any sub-domains in the domain part of the email address is considered as an invalid email address. So to conclude, if a hyphen exists in the start of the top-domain part of the email address then Django's EmailValidator should consider that email as invalid.

Long description of the bug:

To be on the same page, we will define some nomenclature to describe the bug in more detail.

Let's use this as an example email address xyz@a.b.com.

  1. The "xyz" part is the local part of the email address and it is not of the interest in this bug report.
  2. The "a.b.com" is the domain part of the email address.
  3. The "a", "b" and "com" are the sub-domains and "com" is specifically a top-domain.

We will reference the section 4.1.2 of the RFC 5321 Simple Mail Transfer Protocol specification as
that document represents the ground truth representation of how email address should be structured.

The ABNF representation of the domain part of the email address is represented as something below.

Domain = sub-domain *("." sub-domain)

sub-domain = Let-dig [Ldh-str]

Let-dig = ALPHA / DIGIT

Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig

We believe the implementation of the domain checker is incorrect based on the above ABNF representation.
The Domain component of an email adress, as defined by the information above, is comprised of a sub-domain followed by 0 or more tokens containing a period "." and another sub-domain. The definition of a subdomain is a Let-dig followed by an [Ldh-str]. Note the definition of Let-dig is restricted to alphanumeric characters as given by 'ALPHA / DIGIT'. Also note the definition of [Ldh-str] as 0 or more alphanumeric characters or hyphens, followed by a Let-dig. So, a [Ldh-str] must end with a strictly alphanumeric character.

Since a sub-domain has a strict ordering where Let-dig come first and then [Ldh-str] comes next, we can infer that the first and last character in each subdomain can only be an alphanumeric character (a-zA-z0-9). Thus, the fact that Django's email checker allows the subdomain "-com", for example, is a violation of RFC's specifications because of the placement of the hyphen as the first character.

Reference: https://www.rfc-editor.org/rfc/rfc5321.html#section-4.1.2

Attachments (1)

django_validator_test.py (811 bytes ) - added by Niko 17 months ago.
Python script which uses the bugged function to demonstrate our point

Download all attachments as: .zip

Change History (2)

by Niko, 17 months ago

Attachment: django_validator_test.py added

Python script which uses the bugged function to demonstrate our point

comment:1 by Mariusz Felisiak, 17 months ago

Resolution: duplicate
Status: newclosed

Duplicate of #25452.

Note: See TracTickets for help on using tickets.
Back to Top