I'm just noticing today that unicode.org has recommendations on email security:
https://www.unicode.org/reports/tr39/#Email_Security_Profiles
- It must be in NFKC format
- It must have level = <restriction level> or less, from
Restriction_Level_Detection
- It must not have mixed number systems according to
Mixed_Number_Detection
- It must satisfy dot-atom-text from RFC 5322 §3.2.3, where atext is extended as follows:
- Where C ≤ U+007F, C is defined as in §3.2.3. (That is, C ∈ [!#-'*+\-/-9=?A-Z\-~]. This list copies what is already in §3.2.3, and follows HTML5 for ASCII.)
- Where C > U+007F, both of the following conditions are true:
- C has
Identifier_Status=Allowed
from General Security Profile
- If C is the first character, it must be
XID_Start
from Default Identifier_Syntax in [UAX31]
It doesn't recommend which "restriction level" to use, and maybe we should allow the user to decide what level to use (defaulting to 1: ASCII-Only).
(Also, it would be nice if Python implemented "Mixed-Script Detection", "Restriction-Level Detection" and "Mixed-Number Detection" as part of unicodedata.)