Opened 9 years ago

Last modified 4 months ago

#27029 assigned Cleanup/optimization

Make EmailValidator accept non-ASCII characters in local part

Reported by: Ramin Farajpour Cami Owned by: j-bernard
Component: Core (Other) Version: dev
Severity: Normal Keywords:
Cc: Florian Apolloner, Carlos Palol, Collin Anderson Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Collin Anderson)

from django.core.validators import validate_email
validate_email('うえあいお@email.com')

if you check this url email chacker with うえあいお@address.com , this is not valid email address ,

Thanks,
Ramin

Change History (40)

comment:1 by Claude Paroz, 9 years ago

Resolution: duplicate
Status: newclosed

Sure thing!
Duplicate of #26423

comment:2 by Claude Paroz, 9 years ago

Has patch: set
Resolution: duplicate
Status: closednew
Triage Stage: UnreviewedAccepted
Version: 1.10master

Reopening as I have a patch which targets this specific issue.

in reply to:  2 comment:3 by Ramin Farajpour Cami, 9 years ago

Replying to claudep:

Reopening as I have a patch which targets this specific issue.

Hi,

Thanks a lot,

comment:4 by Tim Graham, 9 years ago

Triage Stage: AcceptedReady for checkin

comment:5 by Claude Paroz, 9 years ago

I'm just wondering, is there still a use case to keep ASCII-only validation (and hence provide validate_email_ascii)?

comment:6 by Tim Graham, 9 years ago

Not sure, maybe you want to ask on the DevelopersMailingList. I guess usage might be a bit difficult until #25594 is fixed.

comment:7 by Claude Paroz, 9 years ago

Triage Stage: Ready for checkinAccepted

I just tested Firefox and Chrome email validation, and they don't accept non-ASCII in the local part.

In any case, I think we should provide both validators (ASCII-only and Unicode). It might be a bit too soon to unconditionally allow Unicode in emails.

comment:8 by Claude Paroz, 9 years ago

Patch needs improvement: set

comment:9 by Tim Graham, 9 years ago

Summary: invalid email addresses input django validate_emailMake EmailVadliator accept non-ASCII characters
Type: BugCleanup/optimization

See #21859 for a documentation request to clarify the state of ASCII/Unicode email addresses in Django.

comment:10 by Tim Graham, 9 years ago

Summary: Make EmailVadliator accept non-ASCII charactersMake EmailValidator accept non-ASCII characters

comment:11 by Ramin Farajpour Cami, 9 years ago

Hi Tim,

Please merge PR ,
Thanks

comment:12 by Claude Paroz, 9 years ago

Hey RaminFP,
I plan to come with a new patch, with two versions of the email validator. I don't think that non-ASCII local parts of email addresses are widespread enough to set it by default. The idea is that you could easily opt-in for the Unicode validator of your choice when you define the field in your models.

in reply to:  12 comment:13 by Ramin Farajpour Cami, 9 years ago

Replying to claudep:

Hey RaminFP,
I plan to come with a new patch, with two versions of the email validator. I don't think that non-ASCII local parts of email addresses are widespread enough to set it by default. The idea is that you could easily opt-in for the Unicode validator of your choice when you define the field in your models.

Will be fix in new version 1.11? i can make suggestions to fix?

comment:14 by Claude Paroz, 9 years ago

Yes, the plan is clearly to include that in 1.11. We still have some months ahead :-)

in reply to:  14 comment:15 by Ramin Farajpour Cami, 9 years ago

Replying to claudep:

Yes, the plan is clearly to include that in 1.11. We still have some months ahead :-)

Owesome, i have one question, you have CONTRIBUTORS LIST i can add this list? or no i should try for send a lot of report for added to list contributors?

comment:16 by Claude Paroz, 9 years ago

Yes, you are supposed to have done significant work for Django to be listed there. Of course, that's very subjective, but filling a couple of reports isn't sufficient for that.

comment:17 by Ramin Farajpour Cami, 9 years ago

Very good, i like working with Django community always in security and see code issue for fix it,i see you write PR for this report so I can't add my name to ​CONTRIBUTORS LIST, :((

Thanks,

comment:18 by Ramin Farajpour Cami, 9 years ago

Hi,
I see here way Contributing to Django
https://docs.djangoproject.com/en/dev/internals/contributing/
You write PR for patch issue ,this means my name not added?

comment:19 by Tim Graham, 9 years ago

Yes, we add to AUTHORS based on code contributions not bug reports.

comment:20 by Wout De Puysseleir, 9 years ago

Owner: changed from Ramin Farajpour Cami to Wout De Puysseleir
Patch needs improvement: unset
Status: newassigned

PR

I've added a new patch for this.

comment:21 by Florian Apolloner, 9 years ago

Patch needs improvement: set

I am against this patch, adding more regular expressions is the wrong way to go. I'd like to propose to change the current email validator to just check if "@" is in the address and be done with it. See also https://davidcel.is/posts/stop-validating-email-addresses-with-regex/ -- I think this is something which should have a bit of discussion on the mailing list.

comment:22 by Florian Apolloner, 9 years ago

Cc: Florian Apolloner added

comment:23 by Tim Graham, 9 years ago

Ideas about simplification are discussed in #26423 and on the django-developers mailing list.

comment:24 by Collin Anderson, 7 years ago

if we do allow non-ascii, I wonder if we should be sure the email is "printable" (not allow hidden characters like '\u200b') https://docs.python.org/3/library/stdtypes.html#str.isprintable

comment:25 by Mariusz Felisiak, 4 years ago

Owner: Wout De Puysseleir removed
Status: assignednew

comment:26 by j-bernard, 3 years ago

Commenting here since #33967 has been closed as a duplicate.

Unicode in local-part is allowed by the latest standards, therefore EmailValidator is preventing valid email addresses to be used in Django. Making the current regex allow Unicode characters instead of [0-9A-Z] would do the trick.

#26423 won't solve this as HTML5 validator does not allow Unicode in local-part either.

Last edited 3 years ago by j-bernard (previous) (diff)

comment:27 by j-bernard, 3 years ago

I submitted this PR. I made it change as little as possible to at least get Unicode local-part valid.

comment:28 by Jacob Walls, 3 years ago

Patch needs improvement: unset

Improvement flag was set on prior PR proposing additional regular expressions. Current PR simplifies the existing one per comment.

comment:29 by Mariusz Felisiak, 3 years ago

Owner: set to j-bernard
Status: newassigned

comment:30 by Carlton Gibson, 3 years ago

The new PR seems OK™ — for strings `\w` is equivalent to [a-zA-Z0-9_] with ASCII, and the unicode examples then pass.

I worry slightly about bringing in a host of lookalike address vulnerabilities. 🤔

I think this needs a discussion to decide the way forward.

  1. I'm not convinced this is really a distinct issue to #26423.
  2. The mailing list discussion was essentially unanimous to radically simplify here (rather than continue to tweak).

Florian's comment:21 more or less sums it up:

...propose to change the current email validator to just check if "@" is in the address and be done with it.

We've said similar with URLValidator a number of times.

I'm not sure we shouldn't (again) mark this as a duplicate of #26423, re-purpose that to simplify the validation, make sure How to customise validation shows the way forward clearly, and then close everything else in this area as wontfix. 🤔

Last edited 3 years ago by Carlton Gibson (previous) (diff)

comment:31 by Claude Paroz, 3 years ago

Different projects have different requirements. What about providing different validators: a simple one where only <somechar>@<somechar> presence is checked, a more elaborate one like the current one, and an equivalent to the previous allowing unicode. The question then is to decide which would be the default.

comment:32 by Carlton Gibson, 3 years ago

I think that sounds quite reasonable Claude.

comment:33 by j-bernard, 3 years ago

Here is a little context for my use case. In general, I create my own validator whenever it's needed to override the default Django behavior but I have a particular case where django-allauth app is used and is using the EmailValidator. In that case, I cannot easily override it.
My suggestion would then be to make the default validator more permissive to get some flexibility in the kind of use case that I have. One can still include another validation layer on top of that.

If you don't mind implementing a more complex specific validator for internationalized email addresses it would be better to avoid using only a regex. I kept it the simplest as I could in my PR because I'm aware that changing the validator is touchy.

comment:34 by Carlton Gibson, 3 years ago

Patch needs improvement: set

My suggestion would then be to make the default validator more permissive to get some flexibility in the kind of use case that I have. One can still include another validation layer on top of that.

I don't think we can just swap out the current validation for a looser one. Folks will be depending on the existing behaviour.

Maybe we can ship a couple of variants, but I'm not sure what switching method we might allow. I think we need a story there in order to proceed. 🤔

I looked at django-allauth — it's using the validate_email instance, in various, deeply-nested places — I think an issue over there, to look at making that pluggable, is needed really. (In the meantime, one could monkey patch validate_email with whatever validator you wanted to adjust that… — again, not something I think we can just swap out from beneath it.)

comment:35 by Carlos Palol, 2 years ago

Cc: Carlos Palol added

comment:36 by Collin Anderson, 19 months ago

Cc: Collin Anderson added
Description: modified (diff)

I'm just noticing today that unicode.org has recommendations on email security:

https://www.unicode.org/reports/tr39/#Email_Security_Profiles

  • It must be in NFKC format
  • It must have level = <restriction level> or less, from Restriction_Level_Detection
  • It must not have mixed number systems according to Mixed_Number_Detection
  • It must satisfy dot-atom-text from RFC 5322 §3.2.3, where atext is extended as follows:
    • Where C ≤ U+007F, C is defined as in §3.2.3. (That is, C ∈ [!#-'*+\-/-9=?A-Z\-~]. This list copies what is already in §3.2.3, and follows HTML5 for ASCII.)
    • Where C > U+007F, both of the following conditions are true:
      • C has Identifier_Status=Allowed from General Security Profile
      • If C is the first character, it must be XID_Start from Default Identifier_Syntax in [UAX31]

It doesn't recommend which "restriction level" to use, and maybe we should allow the user to decide what level to use (defaulting to 1: ASCII-Only).

(Also, it would be nice if Python implemented "Mixed-Script Detection", "Restriction-Level Detection" and "Mixed-Number Detection" as part of unicodedata.)

comment:37 by Claude Paroz, 8 months ago

Summary: Make EmailValidator accept non-ASCII charactersMake EmailValidator accept non-ASCII characters in local part

comment:38 by Mike Edmunds, 8 months ago

[Related ticket: #35714 covers Django's SMTP EmailBackend being able to send to addresses with non-ASCII local parts.]

comment:39 by Mike Edmunds, 5 months ago

I've opened a forum discussion about this ticket (and the maybe-duplicate-maybe-not #26423), to try to either clarify it or close it wontfix.

comment:40 by Mike Edmunds, 4 months ago

Per some of the earlier comments here, I've proposed a simplified EmailValidator (that would also allow EAI local parts) at https://forum.djangoproject.com/t/emailvalidator-simplification-and-international-email-addresses/39985/11. If that receives support, it would obsolete this ticket.

But if that simplified proposal—or something like it—is not accepted (and this ticket remains accepted), then this ticket will need to be addressed by following RFC 6532 to allow UTF8-non-ascii characters in the EmailValidator user_regex, in both the dot-atom and quoted-string parts.

Since this is a potentially breaking change, it would need to be opt-in, e.g. via a new accept_eai_user EmailValidator keyword param (like DomainNameValidator's accept_idna, but defaulting to False).

To address concerns about regular expression complexity, it would probably be helpful to break up EmailValidator's user_regex into simpler components, similar to what was done in DomainNameValidator. (I suggest using the RFC 5322 terms: atom, dot_atom, qtext, quoted_pair, quoted_string. Since RFC 6532 is specified using those same terms, it should help with reviewing a PR for this ticket.)

Incidentally, as of April 2025, no major browser allows non-ASCII local-parts in an <input type=email>. (There's a lengthy and ongoing WHATWG discussion about EAI.) But EmailValidator is not just for email addresses that come from a browser input field, and EAI addresses are valid email addresses in a context where you're trying to support EAI.

Note: See TracTickets for help on using tickets.
Back to Top