Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#31053 closed Bug (invalid)

EmailValidator should not accept soft hyphen in email addresses.

Reported by: Mogoh Viol Owned by: nobody
Component: Core (Mail) Version: dev
Severity: Normal Keywords:
Cc: Joachim Jablon Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This emailaddress contains an invisible soft hyphen: test@example­example.com

Djangos EmailValidator accepts that, but should not:

from django.core import validators
validators.validate_email('test@example­example.com')

Pythons formataddr does not accept it:

from email.utils import formataddr
formataddr(('','test@example­example.com'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/email/utils.py", line 91, in formataddr
    address.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\xad' in position 12: ordinal not in range(128)

Djangos EmailValidator should not accept soft hyphens.

Change History (7)

comment:1 by Mariusz Felisiak, 4 years ago

Cc: Joachim Jablon added
Component: UncategorizedCore (Mail)
Resolution: needsinfo
Status: newclosed
Summary: EmailValidator should not accept soft hyphen in email addressesEmailValidator should not accept soft hyphen in email addresses.
Version: 2.2master

I'm not sure about this, email.headerregistry.parser.get_mailbox() doesn't raise any exception on soft hyphens. Can you share link to the RFC which forbids such characters in domains?

Joachim, What do you think?

comment:2 by Mogoh Viol, 4 years ago

RFC 1035 2.3.1 Says (https://tools.ietf.org/html/rfc1035#section-2.3.1):

They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen.

Of course, there are by now some internationalized domain names (https://en.wikipedia.org/wiki/Internationalized_domain_name) for non-ascii characters.
I do not know those specifications in detail.
What I know is, that non-ascii characters are encoded in ascii using punycode.

But if special characters accepted, then other special characters like "äöü" should also be accepted.
If "äöü" are not allowed, soft hyphens should also be forbidden.

comment:3 by Mariusz Felisiak, 4 years ago

Yes, non-ASCII domains are supported (see tests).

comment:4 by Mogoh Viol, 4 years ago

Ok, I have made a mistake, but I am still not a 100% convinced.

Indeed the EmailValidator accepts non-ascii domains.
It does not accept, non-ascii local-parts as in the example below.

In [2]: from django.core import validators 
   ...: validators.validate_email('to@éxample.com') 
   ...: validators.validate_email('tó@example.com')                                                                                         
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-2-1da70ef004db> in <module>
      1 from django.core import validators
      2 validators.validate_email('to@éxample.com')
----> 3 validators.validate_email('tó@example.com')

~/.local/share/virtualenvs/website-10TxyhRr/lib/python3.6/site-packages/django/core/validators.py in __call__(self, value)
    194 
    195         if not self.user_regex.match(user_part):
--> 196             raise ValidationError(self.message, code=self.code)
    197 
    198         if (domain_part not in self.domain_whitelist and

ValidationError: ['Bitte gültige E-Mail-Adresse eingeben.']

However, this is a different issue (if this is an issue at all).

The questions remains: Is a domain containing a soft hyphen a valid domain?
I guess not, but I honestly don't know.
I think, it is really complicated, to test for a valid domain, including only allowed unicode characters.
So I understand, that we only make a simple "sanity test" and, in case of doubt, allow more invalid email-addresses.

If know one else thinks, filtering out emails with soft hyphens is a good idea, we can leave the bug closed.

comment:5 by Joachim Jablon, 4 years ago

Ok, gonna do my best from a phone.

If I recall correctly, the idea is that as much as possible, emails that pass the validator should be properly processed.

Given that it’s fairly easy to split the local and domain parts (the last @ sign is the separator), then it’s feasible to blindly apply punycode if the domain contains non-ascii characters, which is done in the code. The same cannot be done for local part.

For the local part, it’s a bit complicated and there are some things to take into account:

  • on the validator side, special chars are accepted if the local part is enclosed between double quotes: "kéké"@example.com
  • The algorithm is a bit different if the validation part and in the sending part because validation only accepts emails whereas email sending accepts boths emails and mailboxes (Your Name <youraddress@…>)

comment:6 by Mariusz Felisiak, 4 years ago

Resolution: needsinfoinvalid

Thanks Joachim.

comment:7 by Mogoh Viol, 4 years ago

Thanks for explaining.

Note: See TracTickets for help on using tickets.
Back to Top