Opened 11 years ago

Closed 10 years ago

#21105 closed New feature (worksforme)

Implement `PBKDF2PrehashPasswordHasher` to prevent hashing time from depending on password length

Reported by: Ram Rachum Owned by: nobody
Component: contrib.auth Version: 1.6-beta-1
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

See attached patch.

PBKDF2PrehashPasswordHasher is implemented. Now passwords are prehashed, and hashing time isn't influenced by password length, and the DoS attack vector is avoided.

(Patch needs some polish, I wasn't able to run tests, and docs need to be written.))

Attachments (1)

0001-PBKDF2PrehashPasswordHasher.patch (6.7 KB ) - added by Ram Rachum 11 years ago.

Download all attachments as: .zip

Change History (9)

comment:2 by Simon Charette, 11 years ago

Patch needs improvement: set

Something is wrong with the patch. The tested encoded part should start with pbkdf2_prehash_256.

comment:3 by Ram Rachum, 11 years ago

Patch replaced with better patch, please try again.

by Ram Rachum, 11 years ago

comment:4 by Paul Oswald, 11 years ago

The point of PBKDF-2 is key-streteching. That is, to make it at least some configurable difficulty level to calculate. It does this by passing multiple rounds of of a derivation function which is currently sha256. ( https://github.com/django/django/blob/master/django/utils/crypto.py#L136)

There are many places in this function that would be sensitive to input length (such as the force_bytes call) but I have some comments regarding that:

  • I'm not convinced we should be trying to make the hasher work in constant time given any input. Ideally, it's goal is to ensure a minimum time/memory/computational cost. I think maybe it would be better to have the caller responsible to prevent overly long input from going in.
  • Does this implementation actually cause a long input to run at a consistent speed? Doesn't a call to binascii.hexlify(self.digest(force_bytes(password)).digest()) also take a long time for long input and effectively just do the same thing as the first round of PBKDF-2?
  • The point of a hashing function is to come up with a small string where each part of the whole input affects the output. This means at a minimum you have to read in the whole input and mix every byte into the hash. For this reason, I think it's not practical to expect an unbounded password length to supported.

It makes me nervous to be working on implementations of security features here before we've really nailed down what the goal is. If the goal is to avoid DOS attacks, I believe the solution that was patched in on the 1.5.4 security build (limiting input) was a pretty good implementation.

Last edited 11 years ago by Paul Oswald (previous) (diff)

comment:5 by Ram Rachum, 11 years ago

Let me address your concerns.

"I'm not convinced we should be trying to make the hasher work in constant time given any input. Ideally, it's goal is to ensure a minimum time/memory/computational cost."

Making the hasher work in constant time given any input does *not* compromise the goal of ensuring minimum time/memory/computational cost.

"Does this implementation actually cause a long input to run at a consistent speed? Doesn't a call to binascii.hexlify(self.digest(force_bytes(password)).digest()) also take a long time for long input and effectively just do the same thing as the first round of PBKDF-2?"

Of course that call, which is simply one single hash iteration rather than thousands, doesn't take a long time to run, and the implementation indeed causes a long input to run at a consistent speed. Here are my test runs, time is in seconds:

    >>> t(lambda: h.encode('whateverzz'*1, 'saltysalt', 10000))
    0.1399998664855957
    >>> t(lambda: h.encode('whateverzz'*10, 'saltysalt', 10000))
    0.1359999179840088
    >>> t(lambda: h.encode('whateverzz'*100, 'saltysalt', 10000))
    0.1380000114440918
    >>> t(lambda: h.encode('whateverzz'*400, 'saltysalt', 10000))
    0.13499999046325684

"The point of a hashing function is to come up with a small string where each part of the whole input affects the output."

I don't see why you would even raise this issue in the context of this patch... But if it's not obvious: Under my patch the new hasher still depends on each and every bit of the password, no matter how long. Here I have created a 4000-byte long password, changed only one bit and the hash changed completely:

    >>> h.encode('whateverzz'*400, 'saltysalt', 10000)
    u'pbkdf2_prehash_sha256$10000$saltysalt$prehash$TdQFtlBgmsLxRbLemxnRNkstRhFyBJzCLJGZC2eFnXg='
    >>> h.encode(('whateverzz'*400)[:-1]+'y', 'saltysalt', 10000)
    u'pbkdf2_prehash_sha256$10000$saltysalt$prehash$8CpDdMU5Y77Zdt+Bx15eBuZ959ZL/Q+bkHdUrEy4eU0='

Regarding the second patch on 1.5.4: It's not as elegant as this solution. It compromises two things (Still exposing to attacks up to 4096 bytes, and limiting password length) rather than solving the root problem which is the variance of hashing time on password length.

comment:6 by Ram Rachum, 11 years ago

Anyone has any other objections before merging this in? (Except writing documentation.)

comment:7 by Ram Rachum, 10 years ago

I'd really like to merge this in, if the core developers agree, and if not then discuss why not.

comment:8 by Florian Apolloner, 10 years ago

Resolution: worksforme
Status: newclosed

I don't see any improvement over https://github.com/django/django/commit/68540fe4df44492571bc610a0a043d3d02b3d320 here; hence closing this.

Note: See TracTickets for help on using tickets.
Back to Top