Opened 3 years ago

Closed 5 months ago

Last modified 3 months ago

#21379 closed Bug (fixed)

Add support for Unicode usernames

Reported by: anonymous Owned by: nobody
Component: contrib.auth Version: master
Severity: Normal Keywords: 1.10
Cc: jorgecarleitao Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

class AbstractUser(AbstractBaseUser, PermissionsMixin):
    """
    An abstract base class implementing a fully featured User model with
    admin-compliant permissions.

    Username, password and email are required. Other fields are optional.
    """
    username = models.CharField(_('username'), max_length=30, unique=True,
        help_text=_('Required. 30 characters or fewer. Letters, numbers and '
                    '@/./+/-/_ characters'),
        validators=[
            validators.RegexValidator(re.compile('^[\w.@+-]+$'), _('Enter a valid username.'), 'invalid')
        ])

re.compile should use re.U

Attachments (1)

test_issue21379.diff (1.6 KB) - added by jorgecarleitao 2 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 Changed 3 years ago by chrismedrela

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset
Resolution: needsinfo
Status: newclosed

Could you elaborate on the issue? Non-ascii characters are not allowed by design and if you want to allow unicode in usernames, you need to use a custom user model (see #20694).

Last edited 5 months ago by Tim Graham (previous) (diff)

comment:2 Changed 3 years ago by Raphaël Barrois

Resolution: needsinfo
Status: closednew

The above regexp doesn't do what it seems to be doing: on Python 2, [\w.@+-] is equivalent to [a-zA-Z0-9.@+-].
In Python 3, this will also match all "accented" characters.

The regexp should be updated for consistency between versions:

  • If the goal is to allow any letter-like char, add re.UNICODE to the re.compile call
  • If it should instead only allow ascii letters, the simplest way would be to use the explicit regexp, since the re.ASCII flag exists only in Py3.

Exemple (Python 3):

>>> import re
>>> from django.core import validators
>>> v = validators.RegexValidator(re.compile('^[\w.@+-]+$'), "Enter a valid username.", 'invalid')
>>> v('foo.bar')
>>> v('foo bar')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools-py3/lib/python3.3/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
django.core.exceptions.ValidationError: ['Enter a valid username.']
>>> v('jean-rené')

And on Python 2:

>>> import re
>>> from django.core import validators
>>> v = validators.RegexValidator(re.compile('^[\w.@+-]+$'), "Enter a valid username.", 'invalid')
>>> v('foo.bar')
>>> v('foo bar')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools/lib/python2.7/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
ValidationError: [u'Enter a valid username.']
>>> v('jean-rené')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools/lib/python2.7/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
ValidationError: [u'Enter a valid username.']
Version 0, edited 3 years ago by Raphaël Barrois (next)

comment:3 Changed 3 years ago by Alex Gaynor

Triage Stage: UnreviewedAccepted

The behavior difference between py2 and py3 is clearly a bug, not sure which fix is correct.

Changed 2 years ago by jorgecarleitao

Attachment: test_issue21379.diff added

comment:4 Changed 2 years ago by jorgecarleitao

I digged a bit and the validation is defined in two places:

(1)- UserCreationForm and UserChangeForm uses '^[\w.@+-]+$', which is compiled to re.compile('^[\w.@+-]+$', re.UNICODE) (https://github.com/django/django/blob/master/django/forms/fields.py#L542)
(2)- AbstractUser ORM field validation uses '^[\w.@+-]+$' but it does not pass flags to the RegexValidator, so it compiles as re.compile('^[\w.@+-]+$').

In #20694 was pointing out that AbstractUser is rejecting non-ascii in python2, which is consistent with (2). @aaugustin pointed out that we cannot change AbstractUser because it is backward incompatible and proposed to use UserCreationForm. However, I'm not sure this can be fixed using a custom UserCreationForm. My concern is that the validation is performed by both the validator constructed from UserCreationForm.username, and by the validator of the AbstractUser.username. Because both are tested and both must validate, this gives different results whether we use python2 or python3 because of (2).

To support this, I attach a diff to be run using both versions:

PYTHONPATH=..:$PYTHONPATH python2.7 runtests.py --settings=test_sqlite django.contrib.auth.tests.test_forms.UserCreationFormTest.test_invalid_non_ascii_username

PYTHONPATH=..:$PYTHONPATH python3.3 runtests.py --settings=test_sqlite django.contrib.auth.tests.test_forms.UserCreationFormTest.test_invalid_non_ascii_username

This diff has a test for this ticket and also prints which regex was used on validador.RegexValidator.__call__ (for the sake of this discussion)

print(self.regex.pattern, self.regex.flags)

In Python 2, this prints

^[\w.@+-]+$ 32  # 32 = re.UNICODE
^[\w.@+-]+$ 0    # 0 = default flag of RegexValidator

and username=u'jsmithé' is correctly invalidated

In Python 3, this prints

^[\w.@+-]+$ 32
^[\w.@+-]+$ 32

and username=u'jsmithé' is not invalidated, failing the test.

Last edited 2 years ago by jorgecarleitao (previous) (diff)

comment:5 Changed 2 years ago by jorgecarleitao

Cc: jorgecarleitao added

comment:6 Changed 5 months ago by Avraham

I think we should allow non latin characters, users should be able to have usernames in their native language (hebrew, chinese etc)

I propose we use unicode normalization

unicodedata.normalize(input, 'NFKD')

this could be used for usernames, passwords and other kind of input and should allow non latin.

comment:7 Changed 5 months ago by Claude Paroz

Has patch: set
Patch needs improvement: set

Here's a tentative PR.

comment:8 Changed 5 months ago by Claude Paroz

Patch needs improvement: unset
Version: 1.5master

Should be ready for review.

comment:9 Changed 5 months ago by Claude Paroz

The django-developers discussion.

comment:10 Changed 5 months ago by Claude Paroz

Severity: NormalRelease blocker

Marking as blocker, just as we don't forget to make a decision before releasing 1.10.

comment:11 Changed 5 months ago by Claude Paroz

So beside the original PR which also added Unicode username on Python 2, we have now the alternative PR to keep ASCII-only usernames on Python 2 and Unicode usernames on Python 3. This is more or less the same situation as Django 1.9, but with the ability to change the default behavior, and with improvements regarding Unicode normalization.

comment:12 Changed 5 months ago by Tim Graham

Keywords: 1.10 added
Severity: Release blockerNormal

comment:13 Changed 5 months ago by Tim Graham

Summary: class AbstractUser: validators should compile re with UnicodeAdd support for Unicode usernames
Triage Stage: AcceptedReady for checkin

comment:14 Changed 5 months ago by Claude Paroz <claude@…>

Resolution: fixed
Status: newclosed

In 526575c:

Fixed #21379 -- Created auth-specific username validators

Thanks Tim Graham for the review.

comment:15 Changed 5 months ago by Claude Paroz <claude@…>

In 9935f97:

Refs #21379 -- Normalized unicode username inputs

comment:16 Changed 3 months ago by Tim Graham <timograham@…>

In 39805686:

Refs #21379, #26719 -- Moved username normalization to AbstractBaseUser.

Thanks Huynh Thanh Tam for the initial patch and Claude Paroz for review.

comment:17 Changed 3 months ago by Tim Graham <timograham@…>

In 1b0b6f03:

[1.10.x] Refs #21379, #26719 -- Moved username normalization to AbstractBaseUser.

Thanks Huynh Thanh Tam for the initial patch and Claude Paroz for review.

Backport of 39805686b364358af725b695924a5a6dfa7f5302 from master

Note: See TracTickets for help on using tickets.
Back to Top