Opened 11 years ago
Closed 4 years ago
#21859 closed Bug (invalid)
clarify Django docs re: email addresses and ascii
Reported by: | Chris Jerdonek | Owned by: | nobody |
---|---|---|---|
Component: | Documentation | Version: | 1.6 |
Severity: | Normal | Keywords: | email, ascii, unicode |
Cc: | chris.jerdonek@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
In looking into Django's handling of non-ascii characters in e-mail addresses, I found the following here in the Email section of the Unicode data docs--
"However, you’re still obligated to respect the requirements of the email specifications, so, for example, email addresses should use only ASCII characters."
This may still be technically correct under some interpretation, but I think the wording should be clarified to avoid confusion, and because I think the full picture may be a bit more nuanced. For example, the Wikipedia article on email addresses says, "The local-part of the email address may use any of these ASCII characters RFC 5322 Section 3.2.3, RFC 6531 permits Unicode beyond the ASCII range" (to include one part). Also see the comments to ticket #14301, which reveal that there is probably more worth saying in the portion of the docs that I quoted.
I don't know if other parts of the Django docs say more on this aspect.
Change History (9)
comment:1 by , 11 years ago
Cc: | added |
---|
comment:2 by , 11 years ago
comment:3 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
E-mailaddresses in the EmailField are validated using the following user_regex in django/core/validators.py :
user_regex = re.compile( r"(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*$" # dot-atom r'|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-\011\013\014\016-\177])*"$)', # quoted-string re.IGNORECASE)
According to the RFC-references in Wikipedia however, unicode characters in the e-mail address can be allowed and several people have confirmed they are actually in use.
comment:4 by , 11 years ago
Type: | Cleanup/optimization → Bug |
---|
comment:5 by , 11 years ago
RFC 6531 does define a new SMTPUTF8 extension (http://tools.ietf.org/html/rfc6531) to allow (notably) non-ASCII chars in email addresses. Usage seems to be very scarce however at this time. Allowing non-ASCII chars when 95% of mail servers do currently not seem to support that is debatable.
What could be done is to create a new validate_email_utf8
validator, and maybe add a new support_utf8
argument to EmailField
, or add an addr_validator
argument to EmailField
(defaulting to the legacy validate_email
validator), so as user can choose themselves if they want to allow that usage or not without having to subclass EmailField
altogether.
comment:6 by , 11 years ago
My intent with this ticket was only to clarify the current state of things in the documentation. For the purposes of adding new functionality, we can open a separate ticket.
For the documentation, I see at least two things that can be clarified:
1) My impression is that #14301 has already given Django the ability to handle non-ascii characters under some interpretation of "email address." Namely, it seems some parts of the Django API will accept non-ascii email addresses (but will convert them internally to ASCII via an IDN, though I'm not clear on the details). This is probably worth stating in some form, so readers don't get the impression there is no support at all.
2) We should clarify the part saying that email specifications allow only ASCII characters. For example, we can acknowledge that there are new specifications that support non-ASCII (perhaps referencing the RFC's by name), and state up front that Django does not yet support them.
comment:7 by , 8 years ago
See #27029 for a request to allow EmailValidator
to accept non-ASCII characters.
comment:8 by , 4 years ago
Easy pickings: | set |
---|
comment:9 by , 4 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
The "Email" section was removed from the Unicode docs in d4d812cb567d1f84ef7a569672fdf3c0b83e6fdd. I don't think there is anything left to clarify.
It looks like #14301 added the ability to send emails to addresses with non-ASCII characters, which on its face seem to contradict the section of the documentation referenced above.