Opened 16 months ago

Last modified 8 months ago

#21379 new Bug

class AbstractUser: validators should compile re with Unicode

Reported by: anonymous Owned by: nobody
Component: contrib.auth Version: 1.5
Severity: Normal Keywords:
Cc: jorgecarleitao Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

class AbstractUser(AbstractBaseUser, PermissionsMixin):
    """
    An abstract base class implementing a fully featured User model with
    admin-compliant permissions.

    Username, password and email are required. Other fields are optional.
    """
    username = models.CharField(_('username'), max_length=30, unique=True,
        help_text=_('Required. 30 characters or fewer. Letters, numbers and '
                    '@/./+/-/_ characters'),
        validators=[
            validators.RegexValidator(re.compile('^[\w.@+-]+$'), _('Enter a valid username.'), 'invalid')
        ])

re.compile should use re.U

Attachments (1)

test_issue21379.diff (1.6 KB) - added by jorgecarleitao 10 months ago.

Download all attachments as: .zip

Change History (6)

comment:1 Changed 16 months ago by chrismedrela

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to needsinfo
  • Status changed from new to closed

Could you elaborate on the issue? Non-ascii characters are not allowed by design and if you want to allow unicode in usernames, you need to use a custom user model (see https://code.djangoproject.com/ticket/20694).

comment:2 Changed 12 months ago by xelnor

  • Resolution needsinfo deleted
  • Status changed from closed to new

The above regexp doesn't do what it seems to be doing: on Python 2, [\w.@+-] is equivalent to [a-zA-Z0-9.@+-].
In Python 3, this will also match all "accented" characters.

The regexp should be updated for consistency between versions:

  • If the goal is to allow any letter-like char, add re.UNICODE to the re.compile call
  • If it should instead only allow ascii letters, the simplest way would be to use the explicit regexp, since the re.ASCII flag exists only in Py3.

Exemple (Python 3):

>>> import re
>>> from django.core import validators
>>> v = validators.RegexValidator(re.compile('^[\w.@+-]+$'), "Enter a valid username.", 'invalid')
>>> v('foo.bar')
>>> v('foo bar')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools-py3/lib/python3.3/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
django.core.exceptions.ValidationError: ['Enter a valid username.']
>>> v('jean-rené')

And on Python 2:

>>> import re
>>> from django.core import validators
>>> v = validators.RegexValidator(re.compile('^[\w.@+-]+$'), "Enter a valid username.", 'invalid')
>>> v('foo.bar')
>>> v('foo bar')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools/lib/python2.7/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
ValidationError: [u'Enter a valid username.']
>>> v('jean-rené')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/xelnor/dev/venvs/bluesys-tools/lib/python2.7/site-packages/django/core/validators.py", line 39, in __call__
    raise ValidationError(self.message, code=self.code)
ValidationError: [u'Enter a valid username.']
Version 0, edited 12 months ago by xelnor (next)

comment:3 Changed 12 months ago by Alex

  • Triage Stage changed from Unreviewed to Accepted

The behavior difference between py2 and py3 is clearly a bug, not sure which fix is correct.

Changed 10 months ago by jorgecarleitao

comment:4 Changed 10 months ago by jorgecarleitao

I digged a bit and the validation is defined in two places:

(1)- UserCreationForm and UserChangeForm uses '^[\w.@+-]+$', which is compiled to re.compile('^[\w.@+-]+$', re.UNICODE) (https://github.com/django/django/blob/master/django/forms/fields.py#L542)
(2)- AbstractUser ORM field validation uses '^[\w.@+-]+$' but it does not pass flags to the RegexValidator, so it compiles as re.compile('^[\w.@+-]+$').

In #20694 was pointing out that AbstractUser is rejecting non-ascii in python2, which is consistent with (2). @aaugustin pointed out that we cannot change AbstractUser because it is backward incompatible and proposed to use UserCreationForm. However, I'm not sure this can be fixed using a custom UserCreationForm. My concern is that the validation is performed by both the validator constructed from UserCreationForm.username, and by the validator of the AbstractUser.username. Because both are tested and both must validate, this gives different results whether we use python2 or python3 because of (2).

To support this, I attach a diff to be run using both versions:

PYTHONPATH=..:$PYTHONPATH python2.7 runtests.py --settings=test_sqlite django.contrib.auth.tests.test_forms.UserCreationFormTest.test_invalid_non_ascii_username

PYTHONPATH=..:$PYTHONPATH python3.3 runtests.py --settings=test_sqlite django.contrib.auth.tests.test_forms.UserCreationFormTest.test_invalid_non_ascii_username

This diff has a test for this ticket and also prints which regex was used on validador.RegexValidator.__call__ (for the sake of this discussion)

print(self.regex.pattern, self.regex.flags)

In Python 2, this prints

^[\w.@+-]+$ 32  # 32 = re.UNICODE
^[\w.@+-]+$ 0    # 0 = default flag of RegexValidator

and username=u'jsmithé' is correctly invalidated

In Python 3, this prints

^[\w.@+-]+$ 32
^[\w.@+-]+$ 32

and username=u'jsmithé' is not invalidated, failing the test.

Last edited 8 months ago by jorgecarleitao (previous) (diff)

comment:5 Changed 10 months ago by jorgecarleitao

  • Cc jorgecarleitao added
Note: See TracTickets for help on using tickets.
Back to Top