Opened 10 years ago

Closed 8 years ago

Last modified 5 years ago

#21379 closed Bug (fixed)

Add support for Unicode usernames

Reported by: anonymous Owned by: nobody
Component: contrib.auth Version: dev
Severity: Normal Keywords: 1.10
Cc: jorgecarleitao Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

class AbstractUser(AbstractBaseUser, PermissionsMixin):
    """
    An abstract base class implementing a fully featured User model with
    admin-compliant permissions.

    Username, password and email are required. Other fields are optional.
    """
    username = models.CharField(_('username'), max_length=30, unique=True,
        help_text=_('Required. 30 characters or fewer. Letters, numbers and '
                    '@/./+/-/_ characters'),
        validators=[
            validators.RegexValidator(re.compile('^[\w.@+-]+$'), _('Enter a valid username.'), 'invalid')
        ])

re.compile should use re.U

Attachments (1)

test_issue21379.diff (1.6 KB ) - added by jorgecarleitao 10 years ago.

Download all attachments as: .zip

Change History (23)

comment:1 by chrismedrela, 10 years ago

Resolution: needsinfo
Status: newclosed

Could you elaborate on the issue? Non-ascii characters are not allowed by design and if you want to allow unicode in usernames, you need to use a custom user model (see #20694).

Last edited 8 years ago by Tim Graham (previous) (diff)

comment:2 by Raphaël Barrois, 10 years ago

Resolution: needsinfo
Status: closednew

The above regexp doesn't do what it seems to be doing: on Python 2, [\w.@+-] is equivalent to [a-zA-Z0-9.@+-].
In Python 3, this will also match all "accented" characters.

The regexp should be updated for consistency between versions:

  • If the goal is to allow any letter-like char, add re.UNICODE to the re.compile call
  • If it should instead only allow ascii letters, the simplest way would be to use the explicit regexp, since the re.ASCII flag exists only in Py3.

Two examples, showing that the "jean-rené" username is accepted in Python 3 and not in Python 2:

On Python 3:

>>> import re
>>> from django.core import validators
>>> v = validators.RegexValidator(re.compile('^[\w.@+-]+$'), "Enter a valid username.", 'invalid')
>>> v('foo.bar')
>>> v('foo bar')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools-py3/lib/python3.3/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
django.core.exceptions.ValidationError: ['Enter a valid username.']
>>> v('jean-rené')

And on Python 2:

>>> import re
>>> from django.core import validators
>>> v = validators.RegexValidator(re.compile('^[\w.@+-]+$'), "Enter a valid username.", 'invalid')
>>> v('foo.bar')
>>> v('foo bar')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools/lib/python2.7/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
ValidationError: [u'Enter a valid username.']
>>> v('jean-rené')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools/lib/python2.7/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
ValidationError: [u'Enter a valid username.']
Last edited 10 years ago by Raphaël Barrois (previous) (diff)

comment:3 by Alex Gaynor, 10 years ago

Triage Stage: UnreviewedAccepted

The behavior difference between py2 and py3 is clearly a bug, not sure which fix is correct.

by jorgecarleitao, 10 years ago

Attachment: test_issue21379.diff added

comment:4 by jorgecarleitao, 10 years ago

I digged a bit and the validation is defined in two places:

(1)- UserCreationForm and UserChangeForm uses '^[\w.@+-]+$', which is compiled to re.compile('^[\w.@+-]+$', re.UNICODE) (https://github.com/django/django/blob/master/django/forms/fields.py#L542)
(2)- AbstractUser ORM field validation uses '^[\w.@+-]+$' but it does not pass flags to the RegexValidator, so it compiles as re.compile('^[\w.@+-]+$').

In #20694 was pointing out that AbstractUser is rejecting non-ascii in python2, which is consistent with (2). @aaugustin pointed out that we cannot change AbstractUser because it is backward incompatible and proposed to use UserCreationForm. However, I'm not sure this can be fixed using a custom UserCreationForm. My concern is that the validation is performed by both the validator constructed from UserCreationForm.username, and by the validator of the AbstractUser.username. Because both are tested and both must validate, this gives different results whether we use python2 or python3 because of (2).

To support this, I attach a diff to be run using both versions:

PYTHONPATH=..:$PYTHONPATH python2.7 runtests.py --settings=test_sqlite django.contrib.auth.tests.test_forms.UserCreationFormTest.test_invalid_non_ascii_username

PYTHONPATH=..:$PYTHONPATH python3.3 runtests.py --settings=test_sqlite django.contrib.auth.tests.test_forms.UserCreationFormTest.test_invalid_non_ascii_username

This diff has a test for this ticket and also prints which regex was used on validador.RegexValidator.__call__ (for the sake of this discussion)

print(self.regex.pattern, self.regex.flags)

In Python 2, this prints

^[\w.@+-]+$ 32  # 32 = re.UNICODE
^[\w.@+-]+$ 0    # 0 = default flag of RegexValidator

and username=u'jsmithé' is correctly invalidated

In Python 3, this prints

^[\w.@+-]+$ 32
^[\w.@+-]+$ 32

and username=u'jsmithé' is not invalidated, failing the test.

Last edited 10 years ago by jorgecarleitao (previous) (diff)

comment:5 by jorgecarleitao, 10 years ago

Cc: jorgecarleitao added

comment:6 by Avraham, 8 years ago

I think we should allow non latin characters, users should be able to have usernames in their native language (hebrew, chinese etc)

I propose we use unicode normalization

unicodedata.normalize(input, 'NFKD')

this could be used for usernames, passwords and other kind of input and should allow non latin.

comment:7 by Claude Paroz, 8 years ago

Has patch: set
Patch needs improvement: set

Here's a tentative PR.

comment:8 by Claude Paroz, 8 years ago

Patch needs improvement: unset
Version: 1.5master

Should be ready for review.

comment:9 by Claude Paroz, 8 years ago

The django-developers discussion.

comment:10 by Claude Paroz, 8 years ago

Severity: NormalRelease blocker

Marking as blocker, just as we don't forget to make a decision before releasing 1.10.

comment:11 by Claude Paroz, 8 years ago

So beside the original PR which also added Unicode username on Python 2, we have now the alternative PR to keep ASCII-only usernames on Python 2 and Unicode usernames on Python 3. This is more or less the same situation as Django 1.9, but with the ability to change the default behavior, and with improvements regarding Unicode normalization.

comment:12 by Tim Graham, 8 years ago

Keywords: 1.10 added
Severity: Release blockerNormal

comment:13 by Tim Graham, 8 years ago

Summary: class AbstractUser: validators should compile re with UnicodeAdd support for Unicode usernames
Triage Stage: AcceptedReady for checkin

comment:14 by Claude Paroz <claude@…>, 8 years ago

Resolution: fixed
Status: newclosed

In 526575c:

Fixed #21379 -- Created auth-specific username validators

Thanks Tim Graham for the review.

comment:15 by Claude Paroz <claude@…>, 8 years ago

In 9935f97:

Refs #21379 -- Normalized unicode username inputs

comment:16 by Tim Graham <timograham@…>, 8 years ago

In 39805686:

Refs #21379, #26719 -- Moved username normalization to AbstractBaseUser.

Thanks Huynh Thanh Tam for the initial patch and Claude Paroz for review.

comment:17 by Tim Graham <timograham@…>, 8 years ago

In 1b0b6f03:

[1.10.x] Refs #21379, #26719 -- Moved username normalization to AbstractBaseUser.

Thanks Huynh Thanh Tam for the initial patch and Claude Paroz for review.

Backport of 39805686b364358af725b695924a5a6dfa7f5302 from master

comment:18 by Tim Graham <timograham@…>, 7 years ago

In 714fdba:

[1.10.x] Refs #27807 -- Removed docs for User.username_validator.

The new override functionality claimed in refs #21379 doesn't work.

comment:19 by Tim Graham <timograham@…>, 5 years ago

In c84b91b7:

Refs #27807 -- Removed docs for User.username_validator.

The new override functionality claimed in refs #21379 doesn't work.
Forwardport of 714fdbaa7048c2321f6238d9421137c33d9af7cc from stable/1.10.x.

comment:20 by Tim Graham <timograham@…>, 5 years ago

In 53c83387:

[2.2.x] Refs #27807 -- Removed docs for User.username_validator.

The new override functionality claimed in refs #21379 doesn't work.
Forwardport of 714fdbaa7048c2321f6238d9421137c33d9af7cc from stable/1.10.x.

comment:21 by Tim Graham <timograham@…>, 5 years ago

In fb2b425:

[2.1.x] Refs #27807 -- Removed docs for User.username_validator.

The new override functionality claimed in refs #21379 doesn't work.
Forwardport of 714fdbaa7048c2321f6238d9421137c33d9af7cc from stable/1.10.x.

comment:22 by Tim Graham <timograham@…>, 5 years ago

In 331d7652:

[1.11.x] Refs #27807 -- Removed docs for User.username_validator.

The new override functionality claimed in refs #21379 doesn't work.
Forwardport of 714fdbaa7048c2321f6238d9421137c33d9af7cc from stable/1.10.x.

Note: See TracTickets for help on using tickets.
Back to Top