Opened 11 years ago

Closed 4 years ago

#21859 closed Bug (invalid)

clarify Django docs re: email addresses and ascii

Reported by: Chris Jerdonek Owned by: nobody
Component: Documentation Version: 1.6
Severity: Normal Keywords: email, ascii, unicode
Cc: chris.jerdonek@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

In looking into Django's handling of non-ascii characters in e-mail addresses, I found the following here in the Email section of the Unicode data docs--

"However, you’re still obligated to respect the requirements of the email specifications, so, for example, email addresses should use only ASCII characters."

This may still be technically correct under some interpretation, but I think the wording should be clarified to avoid confusion, and because I think the full picture may be a bit more nuanced. For example, the Wikipedia article on email addresses says, "The local-part of the email address may use any of these ASCII characters RFC 5322 Section 3.2.3, RFC 6531 permits Unicode beyond the ASCII range" (to include one part). Also see the comments to ticket #14301, which reveal that there is probably more worth saying in the portion of the docs that I quoted.

I don't know if other parts of the Django docs say more on this aspect.

Change History (9)

comment:1 by Chris Jerdonek, 11 years ago

Cc: chris.jerdonek@… added

comment:2 by Chris Jerdonek, 11 years ago

It looks like #14301 added the ability to send emails to addresses with non-ASCII characters, which on its face seem to contradict the section of the documentation referenced above.

comment:3 by wim@…, 11 years ago

Triage Stage: UnreviewedAccepted

E-mailaddresses in the EmailField are validated using the following user_regex in django/core/validators.py :

    user_regex = re.compile(
        r"(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*$"  # dot-atom
        r'|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-\011\013\014\016-\177])*"$)',  # quoted-string
        re.IGNORECASE)

According to the RFC-references in Wikipedia however, unicode characters in the e-mail address can be allowed and several people have confirmed they are actually in use.

comment:4 by anonymous, 11 years ago

Type: Cleanup/optimizationBug

comment:5 by Claude Paroz, 11 years ago

RFC 6531 does define a new SMTPUTF8 extension (http://tools.ietf.org/html/rfc6531) to allow (notably) non-ASCII chars in email addresses. Usage seems to be very scarce however at this time. Allowing non-ASCII chars when 95% of mail servers do currently not seem to support that is debatable.

What could be done is to create a new validate_email_utf8 validator, and maybe add a new support_utf8 argument to EmailField, or add an addr_validator argument to EmailField (defaulting to the legacy validate_email validator), so as user can choose themselves if they want to allow that usage or not without having to subclass EmailField altogether.

comment:6 by Chris Jerdonek, 11 years ago

My intent with this ticket was only to clarify the current state of things in the documentation. For the purposes of adding new functionality, we can open a separate ticket.

For the documentation, I see at least two things that can be clarified:

1) My impression is that #14301 has already given Django the ability to handle non-ascii characters under some interpretation of "email address." Namely, it seems some parts of the Django API will accept non-ascii email addresses (but will convert them internally to ASCII via an IDN, though I'm not clear on the details). This is probably worth stating in some form, so readers don't get the impression there is no support at all.

2) We should clarify the part saying that email specifications allow only ASCII characters. For example, we can acknowledge that there are new specifications that support non-ASCII (perhaps referencing the RFC's by name), and state up front that Django does not yet support them.

comment:7 by Tim Graham, 8 years ago

See #27029 for a request to allow EmailValidator to accept non-ASCII characters.

comment:8 by David Smith, 4 years ago

Easy pickings: set

comment:9 by Mariusz Felisiak, 4 years ago

Resolution: invalid
Status: newclosed

The "Email" section was removed from the Unicode docs in d4d812cb567d1f84ef7a569672fdf3c0b83e6fdd. I don't think there is anything left to clarify.

Note: See TracTickets for help on using tickets.
Back to Top