Opened 11 years ago
Closed 11 years ago
#21105 closed New feature (worksforme)
Implement `PBKDF2PrehashPasswordHasher` to prevent hashing time from depending on password length
Reported by: | Ram Rachum | Owned by: | nobody |
---|---|---|---|
Component: | contrib.auth | Version: | 1.6-beta-1 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
See attached patch.
PBKDF2PrehashPasswordHasher
is implemented. Now passwords are prehashed, and hashing time isn't influenced by password length, and the DoS attack vector is avoided.
(Patch needs some polish, I wasn't able to run tests, and docs need to be written.))
Attachments (1)
Change History (9)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Patch needs improvement: | set |
---|
Something is wrong with the patch. The tested encoded part should start with pbkdf2_prehash_256.
by , 11 years ago
Attachment: | 0001-PBKDF2PrehashPasswordHasher.patch added |
---|
comment:4 by , 11 years ago
The point of PBKDF-2 is key-streteching. That is, to make it at least some configurable difficulty level to calculate. It does this by passing multiple rounds of of a derivation function which is currently sha256. ( https://github.com/django/django/blob/master/django/utils/crypto.py#L136)
There are many places in this function that would be sensitive to input length (such as the force_bytes call) but I have some comments regarding that:
- I'm not convinced we should be trying to make the hasher work in constant time given any input. Ideally, it's goal is to ensure a minimum time/memory/computational cost. I think maybe it would be better to have the caller responsible to prevent overly long input from going in.
- Does this implementation actually cause a long input to run at a consistent speed? Doesn't a call to
binascii.hexlify(self.digest(force_bytes(password)).digest())
also take a long time for long input and effectively just do the same thing as the first round of PBKDF-2? - The point of a hashing function is to come up with a small string where each part of the whole input affects the output. This means at a minimum you have to read in the whole input and mix every byte into the hash. For this reason, I think it's not practical to expect an unbounded password length to supported.
It makes me nervous to be working on implementations of security features here before we've really nailed down what the goal is. If the goal is to avoid DOS attacks, I believe the solution that was patched in on the 1.5.4 security build (limiting input) was a pretty good implementation.
comment:5 by , 11 years ago
Let me address your concerns.
"I'm not convinced we should be trying to make the hasher work in constant time given any input. Ideally, it's goal is to ensure a minimum time/memory/computational cost."
Making the hasher work in constant time given any input does *not* compromise the goal of ensuring minimum time/memory/computational cost.
"Does this implementation actually cause a long input to run at a consistent speed? Doesn't a call to binascii.hexlify(self.digest(force_bytes(password)).digest())
also take a long time for long input and effectively just do the same thing as the first round of PBKDF-2?"
Of course that call, which is simply one single hash iteration rather than thousands, doesn't take a long time to run, and the implementation indeed causes a long input to run at a consistent speed. Here are my test runs, time is in seconds:
>>> t(lambda: h.encode('whateverzz'*1, 'saltysalt', 10000)) 0.1399998664855957 >>> t(lambda: h.encode('whateverzz'*10, 'saltysalt', 10000)) 0.1359999179840088 >>> t(lambda: h.encode('whateverzz'*100, 'saltysalt', 10000)) 0.1380000114440918 >>> t(lambda: h.encode('whateverzz'*400, 'saltysalt', 10000)) 0.13499999046325684
"The point of a hashing function is to come up with a small string where each part of the whole input affects the output."
I don't see why you would even raise this issue in the context of this patch... But if it's not obvious: Under my patch the new hasher still depends on each and every bit of the password, no matter how long. Here I have created a 4000-byte long password, changed only one bit and the hash changed completely:
>>> h.encode('whateverzz'*400, 'saltysalt', 10000) u'pbkdf2_prehash_sha256$10000$saltysalt$prehash$TdQFtlBgmsLxRbLemxnRNkstRhFyBJzCLJGZC2eFnXg=' >>> h.encode(('whateverzz'*400)[:-1]+'y', 'saltysalt', 10000) u'pbkdf2_prehash_sha256$10000$saltysalt$prehash$8CpDdMU5Y77Zdt+Bx15eBuZ959ZL/Q+bkHdUrEy4eU0='
Regarding the second patch on 1.5.4: It's not as elegant as this solution. It compromises two things (Still exposing to attacks up to 4096 bytes, and limiting password length) rather than solving the root problem which is the variance of hashing time on password length.
comment:6 by , 11 years ago
Anyone has any other objections before merging this in? (Except writing documentation.)
comment:7 by , 11 years ago
I'd really like to merge this in, if the core developers agree, and if not then discuss why not.
comment:8 by , 11 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
I don't see any improvement over https://github.com/django/django/commit/68540fe4df44492571bc610a0a043d3d02b3d320 here; hence closing this.
See discussion:
https://groups.google.com/forum/#!topic/django-developers/iuSE5Y4R3hg