Opened 12 years ago

Closed 12 years ago

Last modified 5 years ago

#17561 closed Bug (invalid)

EmailField does not automatically lower the case in email addresses

Reported by: zechs.marquie@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords: EmailField, duplicates
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Neither in a form or a model does the EmailField become lower case upon saving or validating.

If this field were to be lower the validation regex would be simpler. In addition if you have a unique constraint on this field within your model then you are free to add test@… and tesT@… and any unique variant of upper/lower case characters.

Seems a bit of a short fall seeing as the EmailField does some validation on email addresses already.

Change History (11)

comment:1 by anonymous, 12 years ago

comment:2 by Nate Bragg, 12 years ago

Resolution: invalid
Status: newclosed

The problem is that test@... and tesT@... are different email addresses. Take a look at the answers to this question How do I upper case an email address?, and the relevant links to the RFC.

In practice uppercase letters are discouraged, but breaking that would probably not be a good idea.

comment:3 by zechs.marquie@…, 12 years ago

I suppose its better to adhere to the RFC. Perhaps an option could be passed to EmailField to state whether you want it to lower all case or not. It would save having to do something in the form validation like

def clean_email(self):
    return self.cleaned_data['email'].lower()
Last edited 12 years ago by Łukasz Rekucki (previous) (diff)

comment:4 by Łukasz Rekucki, 12 years ago

I'm not sure adding another option to save you 2 lines in not so common case is a good idea.

One thing to note is that only the *local part* of the email is case sensitive. Host names are by definition case insensitive (at least in ASCII range, not sure about IDN), ie. me...@example.com and me...@Example.com are the same address. IMO, doing such normalization is useful, but it's a bit different thing then proposed on this ticket.

comment:5 by Aymeric Augustin, 12 years ago

Django shouldn't alter user input. IMO this ticket is "wontfix" anyway.

comment:6 by zechs.marquie@…, 12 years ago

Well think about it this way. The ORM is something that sits between us and the database and the database is supposed to be something we don't want to fudge around behind the ORM's back. The database WON'T notify you if you insert me@… and me@…. For every production site out there recording an email address, this crops up. Twitter, Facebook, Github, Imgur, Stack overflow and Google all treat email as case insensitive. If most email vendors enforce case insensitivity ( for simplifying the administration ) then you will most likely want to follow suit. Email is often stored as a unique column in a database so those two lines are probably already very commonly hit out there.

I cannot force something into Django, but when people say email validation of this nature is NOT common I have to DISAGREE. Its an unwritten rule that we treat the entire email as being case insensitive for security and ease of administration so I think the flag would be useful here. Is it a case of us saying EmailFields operate to strict standards or saying EmailFields accommodate a useful features out of the box.

PS Nothing anyone has said so far is incorrect. But this feature supports both positions by making this check optional.

comment:7 by anonymous, 12 years ago

Interesting, my company ran into this problem as well. We ended up having to run a South schema migration on 2 million users, after making the following modification:

def email_prep(self, value):
    """
    Lower-cases the value returned by super, while still allowing nullable email fields.
    """
    prep_value = super(EmailField, self).get_prep_value()
    if prep_value is not None:
        return prep_value.lower()
    return prep_value

EmailField.get_prep_value = email_prep

comment:8 by anonymous, 12 years ago

Just noticed a typo - prep_value = super(EmailField, self).get_prep_value(value)

comment:9 by Jean-Luc Herren, 8 years ago

This is an old ticket, but since it rates very high on Google I'd like to describe another solution that I have been using and that works great.

The relevant RFCs clearly state that the local-part of the email address is case-sensitive, yet it is still true that many (but not all) email providers operate in a case-insensitive manner. The problem that results is that it allows users to register multiple accounts to my site using various capitalization of their one and the same email address; for example they can register ME@examplecom as well as me@examplecom. This is something that I do not want to allow and I suppose I might not be alone with this.

The obvious and simple solution of lower-casing all user input is not satisfactory to me. I believe that the user's choice of how to capitalize his/her email address should be respected and I wish to not alter it.

My solution involves a field 'normalized_email' which lower-cases the email (or applies other transformations) as well as an 'original_email' field, which stores the original user input. For identification purposes (login, registration, duplicate check) I use the normalized email, but for display and email sending purposes I use the original email.

To make all of this work quite some extra code is necessary. Since my code is very project specific and it's too long anyway, I can't post it here. But here's a quick summary: A method UserManager.normalize_email() will take care of the normalizing transformation. Overriding User.save() will make sure to always set normalized_email to UserManager.normalize_email(original_email) prior to saving to the database. All forms (login form, registration form, user editing forms, admin forms) will have to verify user input and check for duplicate normalized emails manually.

On a side note: Some email providers are insensitive to more than just the capitalization. For example, Gmail will ignore all dots in the email, making john@gmailcom the same as j.o.h.n@gmailcom. To prevent the same email address to be used for multiple accounts, further normalization is possible.

in reply to:  9 comment:10 by Craig de Stigter, 7 years ago

For those using postgres, the CITEXT datatype does this very well. It's case sensitive for retrieval but insensitive for comparison.

Django now has a field for it (1.11 dev)

comment:11 by אורי, 5 years ago

I also want EmailField in our project (Speedy Net) to be saved lowercase in the model and not only in the forms. So for example, if we create (in a test) an object such as user_email_address = UserEmailAddress(user=user, email='EMAIL@EXAMPLE.COM'), I want user_email_address.email to be equal to 'email@example.com'. This setting may be optional but I think it's very important. And by the way, I think users who are not able to receive email at a lowercase email address will not be able to use Speedy Net.

Note: See TracTickets for help on using tickets.
Back to Top