Opened 6 years ago

Closed 6 years ago

Last modified 16 months ago

#17561 closed Bug (invalid)

EmailField does not automatically lower the case in email addresses

Reported by: zechs.marquie@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords: EmailField, duplicates
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


Neither in a form or a model does the EmailField become lower case upon saving or validating.

If this field were to be lower the validation regex would be simpler. In addition if you have a unique constraint on this field within your model then you are free to add test@… and tesT@… and any unique variant of upper/lower case characters.

Seems a bit of a short fall seeing as the EmailField does some validation on email addresses already.

Change History (10)

comment:1 Changed 6 years ago by anonymous

comment:2 Changed 6 years ago by Nate Bragg

Resolution: invalid
Status: newclosed

The problem is that test@... and tesT@... are different email addresses. Take a look at the answers to this question How do I upper case an email address?, and the relevant links to the RFC.

In practice uppercase letters are discouraged, but breaking that would probably not be a good idea.

comment:3 Changed 6 years ago by zechs.marquie@…

I suppose its better to adhere to the RFC. Perhaps an option could be passed to EmailField to state whether you want it to lower all case or not. It would save having to do something in the form validation like

def clean_email(self):
    return self.cleaned_data['email'].lower()
Last edited 6 years ago by Łukasz Rekucki (previous) (diff)

comment:4 Changed 6 years ago by Łukasz Rekucki

I'm not sure adding another option to save you 2 lines in not so common case is a good idea.

One thing to note is that only the *local part* of the email is case sensitive. Host names are by definition case insensitive (at least in ASCII range, not sure about IDN), ie. and are the same address. IMO, doing such normalization is useful, but it's a bit different thing then proposed on this ticket.

comment:5 Changed 6 years ago by Aymeric Augustin

Django shouldn't alter user input. IMO this ticket is "wontfix" anyway.

comment:6 Changed 6 years ago by zechs.marquie@…

Well think about it this way. The ORM is something that sits between us and the database and the database is supposed to be something we don't want to fudge around behind the ORM's back. The database WON'T notify you if you insert me@… and me@…. For every production site out there recording an email address, this crops up. Twitter, Facebook, Github, Imgur, Stack overflow and Google all treat email as case insensitive. If most email vendors enforce case insensitivity ( for simplifying the administration ) then you will most likely want to follow suit. Email is often stored as a unique column in a database so those two lines are probably already very commonly hit out there.

I cannot force something into Django, but when people say email validation of this nature is NOT common I have to DISAGREE. Its an unwritten rule that we treat the entire email as being case insensitive for security and ease of administration so I think the flag would be useful here. Is it a case of us saying EmailFields operate to strict standards or saying EmailFields accommodate a useful features out of the box.

PS Nothing anyone has said so far is incorrect. But this feature supports both positions by making this check optional.

comment:7 Changed 6 years ago by anonymous

Interesting, my company ran into this problem as well. We ended up having to run a South schema migration on 2 million users, after making the following modification:

def email_prep(self, value):
    Lower-cases the value returned by super, while still allowing nullable email fields.
    prep_value = super(EmailField, self).get_prep_value()
    if prep_value is not None:
        return prep_value.lower()
    return prep_value

EmailField.get_prep_value = email_prep

comment:8 Changed 6 years ago by anonymous

Just noticed a typo - prep_value = super(EmailField, self).get_prep_value(value)

comment:9 Changed 21 months ago by Jean-Luc Herren

This is an old ticket, but since it rates very high on Google I'd like to describe another solution that I have been using and that works great.

The relevant RFCs clearly state that the local-part of the email address is case-sensitive, yet it is still true that many (but not all) email providers operate in a case-insensitive manner. The problem that results is that it allows users to register multiple accounts to my site using various capitalization of their one and the same email address; for example they can register ME@examplecom as well as me@examplecom. This is something that I do not want to allow and I suppose I might not be alone with this.

The obvious and simple solution of lower-casing all user input is not satisfactory to me. I believe that the user's choice of how to capitalize his/her email address should be respected and I wish to not alter it.

My solution involves a field 'normalized_email' which lower-cases the email (or applies other transformations) as well as an 'original_email' field, which stores the original user input. For identification purposes (login, registration, duplicate check) I use the normalized email, but for display and email sending purposes I use the original email.

To make all of this work quite some extra code is necessary. Since my code is very project specific and it's too long anyway, I can't post it here. But here's a quick summary: A method UserManager.normalize_email() will take care of the normalizing transformation. Overriding will make sure to always set normalized_email to UserManager.normalize_email(original_email) prior to saving to the database. All forms (login form, registration form, user editing forms, admin forms) will have to verify user input and check for duplicate normalized emails manually.

On a side note: Some email providers are insensitive to more than just the capitalization. For example, Gmail will ignore all dots in the email, making john@gmailcom the same as j.o.h.n@gmailcom. To prevent the same email address to be used for multiple accounts, further normalization is possible.

comment:10 in reply to:  9 Changed 16 months ago by Craig de Stigter

For those using postgres, the CITEXT datatype does this very well. It's case sensitive for retrieval but insensitive for comparison.

Django now has a field for it (1.11 dev)

Note: See TracTickets for help on using tickets.
Back to Top