Opened 3 years ago

Closed 2 months ago

Last modified 6 weeks ago

#21379 closed Bug (fixed)

Add support for Unicode usernames

Reported by: anonymous Owned by: nobody
Component: contrib.auth Version: master
Severity: Normal Keywords: 1.10
Cc: jorgecarleitao Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

class AbstractUser(AbstractBaseUser, PermissionsMixin):
    """
    An abstract base class implementing a fully featured User model with
    admin-compliant permissions.

    Username, password and email are required. Other fields are optional.
    """
    username = models.CharField(_('username'), max_length=30, unique=True,
        help_text=_('Required. 30 characters or fewer. Letters, numbers and '
                    '@/./+/-/_ characters'),
        validators=[
            validators.RegexValidator(re.compile('^[\w.@+-]+$'), _('Enter a valid username.'), 'invalid')
        ])

re.compile should use re.U

Attachments (1)

test_issue21379.diff (1.6 KB) - added by jorgecarleitao 2 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 Changed 3 years ago by chrismedrela

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to needsinfo
  • Status changed from new to closed

Could you elaborate on the issue? Non-ascii characters are not allowed by design and if you want to allow unicode in usernames, you need to use a custom user model (see #20694).

Last edited 3 months ago by timgraham (previous) (diff)

comment:2 Changed 2 years ago by xelnor

  • Resolution needsinfo deleted
  • Status changed from closed to new

The above regexp doesn't do what it seems to be doing: on Python 2, [\w.@+-] is equivalent to [a-zA-Z0-9.@+-].
In Python 3, this will also match all "accented" characters.

The regexp should be updated for consistency between versions:

  • If the goal is to allow any letter-like char, add re.UNICODE to the re.compile call
  • If it should instead only allow ascii letters, the simplest way would be to use the explicit regexp, since the re.ASCII flag exists only in Py3.

Exemple (Python 3):

>>> import re
>>> from django.core import validators
>>> v = validators.RegexValidator(re.compile('^[\w.@+-]+$'), "Enter a valid username.", 'invalid')
>>> v('foo.bar')
>>> v('foo bar')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools-py3/lib/python3.3/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
django.core.exceptions.ValidationError: ['Enter a valid username.']
>>> v('jean-rené')

And on Python 2:

>>> import re
>>> from django.core import validators
>>> v = validators.RegexValidator(re.compile('^[\w.@+-]+$'), "Enter a valid username.", 'invalid')
>>> v('foo.bar')
>>> v('foo bar')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools/lib/python2.7/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
ValidationError: [u'Enter a valid username.']
>>> v('jean-rené')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools/lib/python2.7/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
ValidationError: [u'Enter a valid username.']
Version 0, edited 2 years ago by xelnor (next)

comment:3 Changed 2 years ago by Alex

  • Triage Stage changed from Unreviewed to Accepted

The behavior difference between py2 and py3 is clearly a bug, not sure which fix is correct.

Changed 2 years ago by jorgecarleitao

comment:4 Changed 2 years ago by jorgecarleitao

I digged a bit and the validation is defined in two places:

(1)- UserCreationForm and UserChangeForm uses '^[\w.@+-]+$', which is compiled to re.compile('^[\w.@+-]+$', re.UNICODE) (https://github.com/django/django/blob/master/django/forms/fields.py#L542)
(2)- AbstractUser ORM field validation uses '^[\w.@+-]+$' but it does not pass flags to the RegexValidator, so it compiles as re.compile('^[\w.@+-]+$').

In #20694 was pointing out that AbstractUser is rejecting non-ascii in python2, which is consistent with (2). @aaugustin pointed out that we cannot change AbstractUser because it is backward incompatible and proposed to use UserCreationForm. However, I'm not sure this can be fixed using a custom UserCreationForm. My concern is that the validation is performed by both the validator constructed from UserCreationForm.username, and by the validator of the AbstractUser.username. Because both are tested and both must validate, this gives different results whether we use python2 or python3 because of (2).

To support this, I attach a diff to be run using both versions:

PYTHONPATH=..:$PYTHONPATH python2.7 runtests.py --settings=test_sqlite django.contrib.auth.tests.test_forms.UserCreationFormTest.test_invalid_non_ascii_username

PYTHONPATH=..:$PYTHONPATH python3.3 runtests.py --settings=test_sqlite django.contrib.auth.tests.test_forms.UserCreationFormTest.test_invalid_non_ascii_username

This diff has a test for this ticket and also prints which regex was used on validador.RegexValidator.__call__ (for the sake of this discussion)

print(self.regex.pattern, self.regex.flags)

In Python 2, this prints

^[\w.@+-]+$ 32  # 32 = re.UNICODE
^[\w.@+-]+$ 0    # 0 = default flag of RegexValidator

and username=u'jsmithé' is correctly invalidated

In Python 3, this prints

^[\w.@+-]+$ 32
^[\w.@+-]+$ 32

and username=u'jsmithé' is not invalidated, failing the test.

Last edited 2 years ago by jorgecarleitao (previous) (diff)

comment:5 Changed 2 years ago by jorgecarleitao

  • Cc jorgecarleitao added

comment:6 Changed 3 months ago by tovmeod

I think we should allow non latin characters, users should be able to have usernames in their native language (hebrew, chinese etc)

I propose we use unicode normalization

unicodedata.normalize(input, 'NFKD')

this could be used for usernames, passwords and other kind of input and should allow non latin.

comment:7 Changed 3 months ago by claudep

  • Has patch set
  • Patch needs improvement set

Here's a tentative PR.

comment:8 Changed 3 months ago by claudep

  • Patch needs improvement unset
  • Version changed from 1.5 to master

Should be ready for review.

comment:9 Changed 3 months ago by claudep

The django-developers discussion.

comment:10 Changed 3 months ago by claudep

  • Severity changed from Normal to Release blocker

Marking as blocker, just as we don't forget to make a decision before releasing 1.10.

comment:11 Changed 3 months ago by claudep

So beside the original PR which also added Unicode username on Python 2, we have now the alternative PR to keep ASCII-only usernames on Python 2 and Unicode usernames on Python 3. This is more or less the same situation as Django 1.9, but with the ability to change the default behavior, and with improvements regarding Unicode normalization.

comment:12 Changed 3 months ago by timgraham

  • Keywords 1.10 added
  • Severity changed from Release blocker to Normal

comment:13 Changed 2 months ago by timgraham

  • Summary changed from class AbstractUser: validators should compile re with Unicode to Add support for Unicode usernames
  • Triage Stage changed from Accepted to Ready for checkin

comment:14 Changed 2 months ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 526575c:

Fixed #21379 -- Created auth-specific username validators

Thanks Tim Graham for the review.

comment:15 Changed 2 months ago by Claude Paroz <claude@…>

In 9935f97:

Refs #21379 -- Normalized unicode username inputs

comment:16 Changed 6 weeks ago by Tim Graham <timograham@…>

In 39805686:

Refs #21379, #26719 -- Moved username normalization to AbstractBaseUser.

Thanks Huynh Thanh Tam for the initial patch and Claude Paroz for review.

comment:17 Changed 6 weeks ago by Tim Graham <timograham@…>

In 1b0b6f03:

[1.10.x] Refs #21379, #26719 -- Moved username normalization to AbstractBaseUser.

Thanks Huynh Thanh Tam for the initial patch and Claude Paroz for review.

Backport of 39805686b364358af725b695924a5a6dfa7f5302 from master

Note: See TracTickets for help on using tickets.
Back to Top