Code

Opened 3 months ago

Last modified 3 months ago

#21859 new Bug

clarify Django docs re: email addresses and ascii

Reported by: cjerdonek Owned by: nobody
Component: Documentation Version: 1.6
Severity: Normal Keywords: email,ascii,unicode
Cc: chris.jerdonek@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In looking into Django's handling of non-ascii characters in e-mail addresses, I found the following here in the Email section of the Unicode data docs--

"However, you’re still obligated to respect the requirements of the email specifications, so, for example, email addresses should use only ASCII characters."

This may still be technically correct under some interpretation, but I think the wording should be clarified to avoid confusion, and because I think the full picture may be a bit more nuanced. For example, the Wikipedia article on email addresses says, "The local-part of the email address may use any of these ASCII characters RFC 5322 Section 3.2.3, RFC 6531 permits Unicode beyond the ASCII range" (to include one part). Also see the comments to ticket #14301, which reveal that there is probably more worth saying in the portion of the docs that I quoted.

I don't know if other parts of the Django docs say more on this aspect.

Attachments (0)

Change History (6)

comment:1 Changed 3 months ago by cjerdonek

  • Cc chris.jerdonek@… added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 3 months ago by cjerdonek

It looks like #14301 added the ability to send emails to addresses with non-ASCII characters, which on its face seem to contradict the section of the documentation referenced above.

comment:3 Changed 3 months ago by wim@…

  • Triage Stage changed from Unreviewed to Accepted

E-mailaddresses in the EmailField are validated using the following user_regex in django/core/validators.py :

    user_regex = re.compile(
        r"(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*$"  # dot-atom
        r'|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-\011\013\014\016-\177])*"$)',  # quoted-string
        re.IGNORECASE)

According to the RFC-references in Wikipedia however, unicode characters in the e-mail address can be allowed and several people have confirmed they are actually in use.

comment:4 Changed 3 months ago by anonymous

  • Type changed from Cleanup/optimization to Bug

comment:5 Changed 3 months ago by claudep

RFC 6531 does define a new SMTPUTF8 extension (http://tools.ietf.org/html/rfc6531) to allow (notably) non-ASCII chars in email addresses. Usage seems to be very scarce however at this time. Allowing non-ASCII chars when 95% of mail servers do currently not seem to support that is debatable.

What could be done is to create a new validate_email_utf8 validator, and maybe add a new support_utf8 argument to EmailField, or add an addr_validator argument to EmailField (defaulting to the legacy validate_email validator), so as user can choose themselves if they want to allow that usage or not without having to subclass EmailField altogether.

comment:6 Changed 3 months ago by cjerdonek

My intent with this ticket was only to clarify the current state of things in the documentation. For the purposes of adding new functionality, we can open a separate ticket.

For the documentation, I see at least two things that can be clarified:

1) My impression is that #14301 has already given Django the ability to handle non-ascii characters under some interpretation of "email address." Namely, it seems some parts of the Django API will accept non-ascii email addresses (but will convert them internally to ASCII via an IDN, though I'm not clear on the details). This is probably worth stating in some form, so readers don't get the impression there is no support at all.

2) We should clarify the part saying that email specifications allow only ASCII characters. For example, we can acknowledge that there are new specifications that support non-ASCII (perhaps referencing the RFC's by name), and state up front that Django does not yet support them.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from nobody to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.