Context Navigation

← Previous Ticket
Next Ticket →

#34654 assigned Bug

Post-normalization performed on the Username field leading to the bypass of the whitespace stripping

Reported by:	Sim4n6	Owned by:	George Kussumoto
Component:	contrib.auth	Version:	dev
Severity:	Normal	Keywords:
Cc:		Triage Stage:	Accepted
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description ¶

Summary

In the Django codebase [L74](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/contrib/auth/forms.py#L74), when "strip" is enabled [L265](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/forms/fields.py#L265), the username should not contain any trailing or leading whitespace. However, after stripping those whitespace [L278..L279](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/forms/fields.py#L278..L279), normalization is performed. That means stripping whitespaces could be bypassed in the username using similar Unicode characters as "\u2800"

Take the following case:

result = unicodedata.normalize("NFKC", "\u2800\u2800sim4n6\u2800\u2800".strip())

In case, the stripped string "sim4n6" contains a leading and trailing whitespace, that would be deleted and work perfectly.
In case, the stripped string "\u2800\u2800sim4n6\u2800\u2800" contains similar Unicode characters to the whitespace which could be copied for instance from https://emptycharacter.com.

Impact

There is a security concern regarding including spaces in the username, plus that, the username is attacker-controlled input.
This means the whitespace stripping on the username could be bypassed and a remote user could use unintended white space.

@Sim4n6

According to the ticket's flags, the next step(s) to move this issue forward are:

To provide a patch by sending a pull request. Claim the ticket when you start working so that someone else doesn't duplicate effort. Before sending a pull request, review your work against the patch review checklist. Check the "Has patch" flag on the ticket after sending a pull request and include a link to the pull request in the ticket comment when making that update. The usual format is: [https://github.com/django/django/pull/#### PR].

Change History (13)

comment:1 by Natalia Bidart, 22 months ago

Easy pickings:	unset
Triage Stage:	Unreviewed → Accepted
Version:	4.1 → dev

Accepting following the conversation with the security team. Though, do note that the issue is not considered a security concern but could lead to potential impersonation incidents.

comment:2 by Natalia Bidart, 22 months ago

There wasn't a concrete fix discussed, though some voices said:

normalize_username() already applies NFKC normalization and Python updates the Unicode Character Database in each release.
It'd make sense to update the normalization to include these newly identified empty characters.
But, likely Django shouldn't duplicate the UCD and it should rely on Python's.

Personally, I was wondering if we could signal, visually, the being/end of the username plus, perhaps, showing the username length? This could be a small but straightforward improvement (of course this only makes sense in the templates/forms that Django ships), and perhaps it could serve as a guide to users to apply to their own forms.

comment:3 by Sim4n6, 22 months ago

Btw I still believe that's maybe medium to low severity vulnerability.

comment:4 by Sim4n6, 22 months ago

One way to fix this is normalizing first and then perform the stripping.

comment:5 by Ashutosh singh, 22 months ago

Normalizing first and then performing striping doesn't seem to work.

comment:6 by Sim4n6, 22 months ago

Indeed. Next is my understanding.

In the strip() function doc https://docs.python.org/3/library/stdtypes.html?highlight=removing%20whitespace#str.strip
If the intended character is omitted, the function removes the whitespace.

The problem is that the stripping of the leading and the trailing whitespace is performed before the Unicode normalization.
This case fits the attack "using Unicode encoding to bypass Validation logic" : https://capec.mitre.org/data/definitions/71.html

I have done some initial fuzzing (nothing fancy) the Unicode character BRAILLE PATTERN BLANK fits partially the criteria.
Visually looks like a space. But this one remains the same when normalized.

The idea is there may be a Unicode character that when stripped using the strip() function, it does not get removed. But after normalization with the NFKC form, it may come back as whitespace.

If a user is able to set a username with whitespace, it kind of looks a bit "fishy". Two usernames "sim4n6" and "sim4n6 " (with a trailing space).

And also some fuzzing leads to this code points too:

Code Point: 0xa8 Fuzzed: [ ̈sim4n6 ̈].
Code Point: 0xaf Fuzzed: [ ̄sim4n6 ̄].
Code Point: 0xb4 Fuzzed: [ ́sim4n6 ́].
Code Point: 0xb8 Fuzzed: [ ̧sim4n6 ̧].
Code Point: 0x2d8 Fuzzed: [ ̆sim4n6 ̆].
Code Point: 0x2d9 Fuzzed: [ ̇sim4n6 ̇].
Code Point: 0x2da Fuzzed: [ ̊sim4n6 ̊].
Code Point: 0x2db Fuzzed: [ ̨sim4n6 ̨].
Code Point: 0x2dc Fuzzed: [ ̃sim4n6 ̃].
Code Point: 0x2dd Fuzzed: [ ̋sim4n6 ̋].
Code Point: 0x37a Fuzzed: [ ͅsim4n6 ͅ].
Code Point: 0x384 Fuzzed: [ ́sim4n6 ́].
Code Point: 0x385 Fuzzed: [ ̈́sim4n6 ̈́].
Code Point: 0x1fbd Fuzzed: [ ̓sim4n6 ̓].
Code Point: 0x1fbf Fuzzed: [ ̓sim4n6 ̓].
Code Point: 0x1fc0 Fuzzed: [ ͂sim4n6 ͂].
Code Point: 0x1fc1 Fuzzed: [ ̈͂sim4n6 ̈͂].
Code Point: 0x1fcd Fuzzed: [ ̓̀sim4n6 ̓̀].
Code Point: 0x1fce Fuzzed: [ ̓́sim4n6 ̓́].
Code Point: 0x1fcf Fuzzed: [ ̓͂sim4n6 ̓͂].
Code Point: 0x1fdd Fuzzed: [ ̔̀sim4n6 ̔̀].
Code Point: 0x1fde Fuzzed: [ ̔́sim4n6 ̔́].
Code Point: 0x1fdf Fuzzed: [ ̔͂sim4n6 ̔͂].
Code Point: 0x1fed Fuzzed: [ ̈̀sim4n6 ̈̀].
Code Point: 0x1fee Fuzzed: [ ̈́sim4n6 ̈́].
Code Point: 0x1ffd Fuzzed: [ ́sim4n6 ́].
Code Point: 0x1ffe Fuzzed: [ ̔sim4n6 ̔].
Code Point: 0x2017 Fuzzed: [ ̳sim4n6 ̳].
Code Point: 0x203e Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0x309b Fuzzed: [ ゙sim4n6 ゙].
Code Point: 0x309c Fuzzed: [ ゚sim4n6 ゚].
Code Point: 0xfc5e Fuzzed: [ ٌّsim4n6 ٌّ].
Code Point: 0xfc5f Fuzzed: [ ٍّsim4n6 ٍّ].
Code Point: 0xfc60 Fuzzed: [ َّsim4n6 َّ].
Code Point: 0xfc61 Fuzzed: [ ُّsim4n6 ُّ].
Code Point: 0xfc62 Fuzzed: [ ِّsim4n6 ِّ].
Code Point: 0xfc63 Fuzzed: [ ّٰsim4n6 ّٰ].
Code Point: 0xfe49 Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4a Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4b Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4c Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe70 Fuzzed: [ ًsim4n6 ً].
Code Point: 0xfe72 Fuzzed: [ ٌsim4n6 ٌ].
Code Point: 0xfe74 Fuzzed: [ ٍsim4n6 ٍ].
Code Point: 0xfe76 Fuzzed: [ َsim4n6 َ].
Code Point: 0xfe78 Fuzzed: [ ُsim4n6 ُ].
Code Point: 0xfe7a Fuzzed: [ ِsim4n6 ِ].
Code Point: 0xfe7c Fuzzed: [ ّsim4n6 ّ].
Code Point: 0xfe7e Fuzzed: [ ْsim4n6 ْ].
Code Point: 0xffe3 Fuzzed: [ ̄sim4n6 ̄].

Those code points can be used as trailing and leading in usernames and result in whitespace, after Unicode normalization the way Django does it.

char = chr(code_point)
# Username input with potential trialing and leading whitespace
username = char + "sim4n6" + char
# this is the way it is done by Django
s = unicodedata.normalize("NFKC", username.strip())
if contains_whitespace(s):

# Print the Unicode character and its code point for demonstration purposes
print(f"Code Point: {hex(code_point)}\tResult: [{s}].")

With those code points, it is not only visually a space but a regular one.

Voila

PS: I wrote about this once: https://sim4n6.beehiiv.com/p/unicode-characters-bypass-security-checks

comment:7 by Ashutosh singh, 22 months ago

So what's the best solution here.

comment:8 by Sim4n6, 22 months ago

I suggest normalizing first and then performing the stripping as previously mentioned, that would reduce the odds and then handle the unusual cases.

PS: I still insist on the fact there is a security risk here.

Last edited 22 months ago by Sim4n6 (previous) (diff)

comment:9 by SecondPort, 22 months ago

I think the correct solution to this security error was to add a validation step that rejects user names with certain characters, I hope this is the correct solution and if something else needs to be changed, please let me know, thank you very much.
PR: https://github.com/django/django/pull/17014

comment:10 by Sim4n6, 22 months ago

The contains_unwanted_characters() would fail more often than you expect.

import unicodedata

def contains_unwanted_characters(username):
    """Check if the username contains characters that normalize to whitespace."""
    normalized_username = unicodedata.normalize("NFKC", username)
    return username != normalized_username.strip()

usernames = ["sim4n6", " ̈sim4n6 ̈", "℻", "Ｓim4n6","Siｍ4n6"]
for username in usernames:
    print(contains_unwanted_characters(username))

Between I still believe there is a security concern here. Anyway, I have a security background and this issue is classified as non-related. So, that would be my last comment regarding this issue.

Regards,

comment:11 by George Kussumoto, 9 months ago

Owner:	changed from nobody to George Kussumoto
Status:	new → assigned

comment:12 by Sim4n6, 9 months ago

In case you change your mind regarding this issue , please keep me posted on its outcome via https://hackerone.com/sim4n6?type=user

Thanks

comment:13 by George Kussumoto, 9 months ago

Here are some ideas to discuss:

To the main point of bypassing white space removal: while this is true for the UsernameField, most forms using this field also have another validation in the model level, such as UnicodeUsernameValidator and EmailValidator. These validators run on post-normalized values, so symbols like BRAILLE PATTERN BLANK or white space (regardless of position) will raise errors.

The AuthenticationForm might not invalidate such inputs, but to successfully exploit this, one should have bypassed the creation form first (I think). I'm not a security expert, but maybe there's an additional condition of using non-unique usernames.

The case for the password reset is already addressed in https://www.djangoproject.com/weblog/2019/dec/18/security-releases/

When using custom user models things are different, the application code should include the appropriate validators. Depending on the implementation, the normalization steps too.

At first, I thought about adding UnicodeUsernameValidator to the UsernameField. It was promising, all tests passed. Then I realized it not only breaks compatibility, but we shouldn't make this assumption on the username since the validator is particular about what it accepts, compromising the flexibility to set USERNAME_FIELD in custom models. For example, UnicodeUsernameValidator is stricter than EmailValidator (at least at first glance).

Another possibility was to add the validator in the form definition instead of the form field (since it's expected some customization when using custom models). Possible candidates were UserCreationForm and UserChangeForm (custom-users-and-built-in-auth-forms doc) since they are more tied to the default user model. But this is already accomplished by having the validator in the model.

As mentioned before in the comments, re-running .strip() after normalization or comparing string length doesn't seem to address the problem reported.

So far, I don't have a conclusion, but I lean lightly on the side that no code changes are required. I wanted to share my thoughts and hear others.
I don't have a security background so I might have overlooked/simplified things a bit. Please keep that in mind.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues