Opened 11 months ago

Last modified 10 months ago

#34654 new Bug

Post-normalization performed on the Username field leading to the bypass of the whitespace stripping

Reported by: Sim4n6 Owned by: nobody
Component: contrib.auth Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Summary

In the Django codebase [L74](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/contrib/auth/forms.py#L74), when "strip" is enabled [L265](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/forms/fields.py#L265), the username should not contain any trailing or leading whitespace. However, after stripping those whitespace [L278..L279](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/forms/fields.py#L278..L279), normalization is performed. That means stripping whitespaces could be bypassed in the username using similar Unicode characters as "\u2800"

Take the following case:

result = unicodedata.normalize("NFKC", "\u2800\u2800sim4n6\u2800\u2800".strip())

In case, the stripped string "sim4n6" contains a leading and trailing whitespace, that would be deleted and work perfectly.
In case, the stripped string "\u2800\u2800sim4n6\u2800\u2800" contains similar Unicode characters to the whitespace which could be copied for instance from https://emptycharacter.com.

Impact

There is a security concern regarding including spaces in the username, plus that, the username is attacker-controlled input.
This means the whitespace stripping on the username could be bypassed and a remote user could use unintended white space.

@Sim4n6

Change History (10)

comment:1 by Natalia Bidart, 11 months ago

Easy pickings: unset
Triage Stage: UnreviewedAccepted
Version: 4.1dev

Accepting following the conversation with the security team. Though, do note that the issue is not considered a security concern but could lead to potential impersonation incidents.

comment:2 by Natalia Bidart, 11 months ago

There wasn't a concrete fix discussed, though some voices said:

  • normalize_username() already applies NFKC normalization and Python updates the Unicode Character Database in each release.
  • It'd make sense to update the normalization to include these newly identified empty characters.
  • But, likely Django shouldn't duplicate the UCD and it should rely on Python's.

Personally, I was wondering if we could signal, visually, the being/end of the username plus, perhaps, showing the username length? This could be a small but straightforward improvement (of course this only makes sense in the templates/forms that Django ships), and perhaps it could serve as a guide to users to apply to their own forms.

comment:3 by Sim4n6, 11 months ago

Btw I still believe that's maybe medium to low severity vulnerability.

comment:4 by Sim4n6, 11 months ago

One way to fix this is normalizing first and then perform the stripping.

comment:5 by Ashutosh singh, 11 months ago

Normalizing first and then performing striping doesn't seem to work.

comment:6 by Sim4n6, 11 months ago

Indeed. Next is my understanding.

In the strip() function doc https://docs.python.org/3/library/stdtypes.html?highlight=removing%20whitespace#str.strip
If the intended character is omitted, the function removes the whitespace.

The problem is that the stripping of the leading and the trailing whitespace is performed before the Unicode normalization.
This case fits the attack "using Unicode encoding to bypass Validation logic" : https://capec.mitre.org/data/definitions/71.html

I have done some initial fuzzing (nothing fancy) the Unicode character BRAILLE PATTERN BLANK fits partially the criteria.
Visually looks like a space. But this one remains the same when normalized.

The idea is there may be a Unicode character that when stripped using the strip() function, it does not get removed. But after normalization with the NFKC form, it may come back as whitespace.

If a user is able to set a username with whitespace, it kind of looks a bit "fishy". Two usernames "sim4n6" and "sim4n6 " (with a trailing space).

And also some fuzzing leads to this code points too:

Code Point: 0xa8 Fuzzed: [ ̈sim4n6 ̈].
Code Point: 0xaf Fuzzed: [ ̄sim4n6 ̄].
Code Point: 0xb4 Fuzzed: [ ́sim4n6 ́].
Code Point: 0xb8 Fuzzed: [ ̧sim4n6 ̧].
Code Point: 0x2d8 Fuzzed: [ ̆sim4n6 ̆].
Code Point: 0x2d9 Fuzzed: [ ̇sim4n6 ̇].
Code Point: 0x2da Fuzzed: [ ̊sim4n6 ̊].
Code Point: 0x2db Fuzzed: [ ̨sim4n6 ̨].
Code Point: 0x2dc Fuzzed: [ ̃sim4n6 ̃].
Code Point: 0x2dd Fuzzed: [ ̋sim4n6 ̋].
Code Point: 0x37a Fuzzed: [ ͅsim4n6 ͅ].
Code Point: 0x384 Fuzzed: [ ́sim4n6 ́].
Code Point: 0x385 Fuzzed: [ ̈́sim4n6 ̈́].
Code Point: 0x1fbd Fuzzed: [ ̓sim4n6 ̓].
Code Point: 0x1fbf Fuzzed: [ ̓sim4n6 ̓].
Code Point: 0x1fc0 Fuzzed: [ ͂sim4n6 ͂].
Code Point: 0x1fc1 Fuzzed: [ ̈͂sim4n6 ̈͂].
Code Point: 0x1fcd Fuzzed: [ ̓̀sim4n6 ̓̀].
Code Point: 0x1fce Fuzzed: [ ̓́sim4n6 ̓́].
Code Point: 0x1fcf Fuzzed: [ ̓͂sim4n6 ̓͂].
Code Point: 0x1fdd Fuzzed: [ ̔̀sim4n6 ̔̀].
Code Point: 0x1fde Fuzzed: [ ̔́sim4n6 ̔́].
Code Point: 0x1fdf Fuzzed: [ ̔͂sim4n6 ̔͂].
Code Point: 0x1fed Fuzzed: [ ̈̀sim4n6 ̈̀].
Code Point: 0x1fee Fuzzed: [ ̈́sim4n6 ̈́].
Code Point: 0x1ffd Fuzzed: [ ́sim4n6 ́].
Code Point: 0x1ffe Fuzzed: [ ̔sim4n6 ̔].
Code Point: 0x2017 Fuzzed: [ ̳sim4n6 ̳].
Code Point: 0x203e Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0x309b Fuzzed: [ ゙sim4n6 ゙].
Code Point: 0x309c Fuzzed: [ ゚sim4n6 ゚].
Code Point: 0xfc5e Fuzzed: [ ٌّsim4n6 ٌّ].
Code Point: 0xfc5f Fuzzed: [ ٍّsim4n6 ٍّ].
Code Point: 0xfc60 Fuzzed: [ َّsim4n6 َّ].
Code Point: 0xfc61 Fuzzed: [ ُّsim4n6 ُّ].
Code Point: 0xfc62 Fuzzed: [ ِّsim4n6 ِّ].
Code Point: 0xfc63 Fuzzed: [ ّٰsim4n6 ّٰ].
Code Point: 0xfe49 Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4a Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4b Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4c Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe70 Fuzzed: [ ًsim4n6 ً].
Code Point: 0xfe72 Fuzzed: [ ٌsim4n6 ٌ].
Code Point: 0xfe74 Fuzzed: [ ٍsim4n6 ٍ].
Code Point: 0xfe76 Fuzzed: [ َsim4n6 َ].
Code Point: 0xfe78 Fuzzed: [ ُsim4n6 ُ].
Code Point: 0xfe7a Fuzzed: [ ِsim4n6 ِ].
Code Point: 0xfe7c Fuzzed: [ ّsim4n6 ّ].
Code Point: 0xfe7e Fuzzed: [ ْsim4n6 ْ].
Code Point: 0xffe3 Fuzzed: [ ̄sim4n6 ̄].

Those code points can be used as trailing and leading in usernames and result in whitespace, after Unicode normalization the way Django does it.

char = chr(code_point)
# Username input with potential trialing and leading whitespace
username = char + "sim4n6" + char
# this is the way it is done by Django
s = unicodedata.normalize("NFKC", username.strip())
if contains_whitespace(s):

# Print the Unicode character and its code point for demonstration purposes
print(f"Code Point: {hex(code_point)}\tResult: [{s}].")

With those code points, it is not only visually a space but a regular one.

Voila

PS: I wrote about this once: https://sim4n6.beehiiv.com/p/unicode-characters-bypass-security-checks

comment:7 by Ashutosh singh, 11 months ago

So what's the best solution here.

comment:8 by Sim4n6, 10 months ago

I suggest normalizing first and then performing the stripping as previously mentioned, that would reduce the odds and then handle the unusual cases.

PS: I still insist on the fact there is a security risk here.

Last edited 10 months ago by Sim4n6 (previous) (diff)

comment:9 by SecondPort, 10 months ago

I think the correct solution to this security error was to add a validation step that rejects user names with certain characters, I hope this is the correct solution and if something else needs to be changed, please let me know, thank you very much.
PR: https://github.com/django/django/pull/17014

comment:10 by Sim4n6, 10 months ago

The contains_unwanted_characters() would fail more often than you expect.

import unicodedata

def contains_unwanted_characters(username):
    """Check if the username contains characters that normalize to whitespace."""
    normalized_username = unicodedata.normalize("NFKC", username)
    return username != normalized_username.strip()

usernames = ["sim4n6", " ̈sim4n6 ̈", "℻", "Sim4n6","Sim4n6"]
for username in usernames:
    print(contains_unwanted_characters(username))

Between I still believe there is a security concern here. Anyway, I have a security background and this issue is classified as non-related. So, that would be my last comment regarding this issue.

Regards,

Note: See TracTickets for help on using tickets.
Back to Top