Opened 9 years ago
Last modified 4 months ago
#27029 assigned Cleanup/optimization
Make EmailValidator accept non-ASCII characters in local part
Reported by: | Ramin Farajpour Cami | Owned by: | j-bernard |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Florian Apolloner, Carlos Palol, Collin Anderson | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
from django.core.validators import validate_email validate_email('うえあいお@email.com')
if you check this url email chacker with うえあいお@address.com , this is not valid email address ,
Thanks,
Ramin
Change History (40)
comment:1 by , 9 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
follow-up: 3 comment:2 by , 9 years ago
Has patch: | set |
---|---|
Resolution: | duplicate |
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
Version: | 1.10 → master |
Reopening as I have a patch which targets this specific issue.
comment:3 by , 9 years ago
Replying to claudep:
Reopening as I have a patch which targets this specific issue.
Hi,
Thanks a lot,
comment:4 by , 9 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:5 by , 9 years ago
I'm just wondering, is there still a use case to keep ASCII-only validation (and hence provide validate_email_ascii
)?
comment:6 by , 9 years ago
Not sure, maybe you want to ask on the DevelopersMailingList. I guess usage might be a bit difficult until #25594 is fixed.
comment:7 by , 9 years ago
Triage Stage: | Ready for checkin → Accepted |
---|
I just tested Firefox and Chrome email validation, and they don't accept non-ASCII in the local part.
In any case, I think we should provide both validators (ASCII-only and Unicode). It might be a bit too soon to unconditionally allow Unicode in emails.
comment:8 by , 9 years ago
Patch needs improvement: | set |
---|
comment:9 by , 9 years ago
Summary: | invalid email addresses input django validate_email → Make EmailVadliator accept non-ASCII characters |
---|---|
Type: | Bug → Cleanup/optimization |
See #21859 for a documentation request to clarify the state of ASCII/Unicode email addresses in Django.
comment:10 by , 9 years ago
Summary: | Make EmailVadliator accept non-ASCII characters → Make EmailValidator accept non-ASCII characters |
---|
follow-up: 13 comment:12 by , 9 years ago
Hey RaminFP,
I plan to come with a new patch, with two versions of the email validator. I don't think that non-ASCII local parts of email addresses are widespread enough to set it by default. The idea is that you could easily opt-in for the Unicode validator of your choice when you define the field in your models.
comment:13 by , 9 years ago
Replying to claudep:
Hey RaminFP,
I plan to come with a new patch, with two versions of the email validator. I don't think that non-ASCII local parts of email addresses are widespread enough to set it by default. The idea is that you could easily opt-in for the Unicode validator of your choice when you define the field in your models.
Will be fix in new version 1.11? i can make suggestions to fix?
follow-up: 15 comment:14 by , 9 years ago
Yes, the plan is clearly to include that in 1.11. We still have some months ahead :-)
comment:15 by , 9 years ago
Replying to claudep:
Yes, the plan is clearly to include that in 1.11. We still have some months ahead :-)
Owesome, i have one question, you have CONTRIBUTORS LIST i can add this list? or no i should try for send a lot of report for added to list contributors?
comment:16 by , 9 years ago
Yes, you are supposed to have done significant work for Django to be listed there. Of course, that's very subjective, but filling a couple of reports isn't sufficient for that.
comment:17 by , 9 years ago
Very good, i like working with Django community always in security and see code issue for fix it,i see you write PR for this report so I can't add my name to CONTRIBUTORS LIST, :((
Thanks,
comment:18 by , 9 years ago
Hi,
I see here way Contributing to Django
https://docs.djangoproject.com/en/dev/internals/contributing/
You write PR for patch issue ,this means my name not added?
comment:20 by , 9 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | unset |
Status: | new → assigned |
I've added a new patch for this.
comment:21 by , 9 years ago
Patch needs improvement: | set |
---|
I am against this patch, adding more regular expressions is the wrong way to go. I'd like to propose to change the current email validator to just check if "@" is in the address and be done with it. See also https://davidcel.is/posts/stop-validating-email-addresses-with-regex/ -- I think this is something which should have a bit of discussion on the mailing list.
comment:22 by , 9 years ago
Cc: | added |
---|
comment:23 by , 9 years ago
Ideas about simplification are discussed in #26423 and on the django-developers mailing list.
comment:24 by , 7 years ago
if we do allow non-ascii, I wonder if we should be sure the email is "printable" (not allow hidden characters like '\u200b') https://docs.python.org/3/library/stdtypes.html#str.isprintable
comment:25 by , 4 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:26 by , 3 years ago
Commenting here since #33967 has been closed as a duplicate.
Unicode in local-part is allowed by the latest standards, therefore EmailValidator
is preventing valid email addresses to be used in Django. Making the current regex allow Unicode characters instead of [0-9A-Z]
would do the trick.
#26423 won't solve this as HTML5 validator does not allow Unicode in local-part either.
comment:27 by , 3 years ago
I submitted this PR. I made it change as little as possible to at least get Unicode local-part valid.
comment:28 by , 3 years ago
Patch needs improvement: | unset |
---|
comment:29 by , 3 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:30 by , 3 years ago
The new PR seems OK™ — for strings `\w` is equivalent to [a-zA-Z0-9_]
with ASCII, and the unicode examples then pass.
I worry slightly about bringing in a host of lookalike address vulnerabilities. 🤔
I think this needs a discussion to decide the way forward.
- I'm not convinced this is really a distinct issue to #26423.
- The mailing list discussion was essentially unanimous to radically simplify here (rather than continue to tweak).
Florian's comment:21 more or less sums it up:
...propose to change the current email validator to just check if "@" is in the address and be done with it.
We've said similar with URLValidator a number of times.
I'm not sure we shouldn't (again) mark this as a duplicate of #26423, re-purpose that to simplify the validation, make sure How to customise validation shows the way forward clearly, and then close everything else in this area as wontfix
. 🤔
comment:31 by , 3 years ago
Different projects have different requirements. What about providing different validators: a simple one where only <somechar>
@<somechar>
presence is checked, a more elaborate one like the current one, and an equivalent to the previous allowing unicode. The question then is to decide which would be the default.
comment:33 by , 3 years ago
Here is a little context for my use case. In general, I create my own validator whenever it's needed to override the default Django behavior but I have a particular case where django-allauth app is used and is using the EmailValidator. In that case, I cannot easily override it.
My suggestion would then be to make the default validator more permissive to get some flexibility in the kind of use case that I have. One can still include another validation layer on top of that.
If you don't mind implementing a more complex specific validator for internationalized email addresses it would be better to avoid using only a regex. I kept it the simplest as I could in my PR because I'm aware that changing the validator is touchy.
comment:34 by , 3 years ago
Patch needs improvement: | set |
---|
My suggestion would then be to make the default validator more permissive to get some flexibility in the kind of use case that I have. One can still include another validation layer on top of that.
I don't think we can just swap out the current validation for a looser one. Folks will be depending on the existing behaviour.
Maybe we can ship a couple of variants, but I'm not sure what switching method we might allow. I think we need a story there in order to proceed. 🤔
I looked at django-allauth
— it's using the validate_email
instance, in various, deeply-nested places — I think an issue over there, to look at making that pluggable, is needed really. (In the meantime, one could monkey patch validate_email
with whatever validator you wanted to adjust that… — again, not something I think we can just swap out from beneath it.)
comment:35 by , 2 years ago
Cc: | added |
---|
comment:36 by , 19 months ago
Cc: | added |
---|---|
Description: | modified (diff) |
I'm just noticing today that unicode.org has recommendations on email security:
https://www.unicode.org/reports/tr39/#Email_Security_Profiles
- It must be in NFKC format
- It must have level = <restriction level> or less, from
Restriction_Level_Detection
- It must not have mixed number systems according to
Mixed_Number_Detection
- It must satisfy dot-atom-text from RFC 5322 §3.2.3, where atext is extended as follows:
- Where C ≤ U+007F, C is defined as in §3.2.3. (That is, C ∈ [!#-'*+\-/-9=?A-Z\-~]. This list copies what is already in §3.2.3, and follows HTML5 for ASCII.)
- Where C > U+007F, both of the following conditions are true:
- C has
Identifier_Status=Allowed
from General Security Profile - If C is the first character, it must be
XID_Start
from Default Identifier_Syntax in [UAX31]
- C has
It doesn't recommend which "restriction level" to use, and maybe we should allow the user to decide what level to use (defaulting to 1: ASCII-Only).
(Also, it would be nice if Python implemented "Mixed-Script Detection", "Restriction-Level Detection" and "Mixed-Number Detection" as part of unicodedata.)
comment:37 by , 8 months ago
Summary: | Make EmailValidator accept non-ASCII characters → Make EmailValidator accept non-ASCII characters in local part |
---|
comment:38 by , 8 months ago
[Related ticket: #35714 covers Django's SMTP EmailBackend being able to send to addresses with non-ASCII local parts.]
comment:39 by , 5 months ago
I've opened a forum discussion about this ticket (and the maybe-duplicate-maybe-not #26423), to try to either clarify it or close it wontfix.
comment:40 by , 4 months ago
Per some of the earlier comments here, I've proposed a simplified EmailValidator (that would also allow EAI local parts) at https://forum.djangoproject.com/t/emailvalidator-simplification-and-international-email-addresses/39985/11. If that receives support, it would obsolete this ticket.
But if that simplified proposal—or something like it—is not accepted (and this ticket remains accepted), then this ticket will need to be addressed by following RFC 6532 to allow UTF8-non-ascii characters in the EmailValidator user_regex, in both the dot-atom and quoted-string parts.
Since this is a potentially breaking change, it would need to be opt-in, e.g. via a new accept_eai_user
EmailValidator keyword param (like DomainNameValidator's accept_idna
, but defaulting to False).
To address concerns about regular expression complexity, it would probably be helpful to break up EmailValidator's user_regex into simpler components, similar to what was done in DomainNameValidator. (I suggest using the RFC 5322 terms: atom, dot_atom, qtext, quoted_pair, quoted_string. Since RFC 6532 is specified using those same terms, it should help with reviewing a PR for this ticket.)
Incidentally, as of April 2025, no major browser allows non-ASCII local-parts in an <input type=email>
. (There's a lengthy and ongoing WHATWG discussion about EAI.) But EmailValidator is not just for email addresses that come from a browser input field, and EAI addresses are valid email addresses in a context where you're trying to support EAI.
Sure thing!
Duplicate of #26423