#21105 closed New feature (worksforme)

Implement `PBKDF2PrehashPasswordHasher` to prevent hashing time from depending on password length

Reported by: coolRR Owned by: nobody
Component: contrib.auth Version: 1.6-beta-1
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

See attached patch.

PBKDF2PrehashPasswordHasher is implemented. Now passwords are prehashed, and hashing time isn't influenced by password length, and the DoS attack vector is avoided.

(Patch needs some polish, I wasn't able to run tests, and docs need to be written.))

Attachments (1)

0001-PBKDF2PrehashPasswordHasher.patch (6.7 KB) - added by coolRR 22 months ago.

Download all attachments as: .zip

Change History (9)

comment:1 Changed 22 months ago by coolRR

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 22 months ago by charettes

  • Patch needs improvement set

Something is wrong with the patch. The tested encoded part should start with pbkdf2_prehash_256.

comment:3 Changed 22 months ago by coolRR

Patch replaced with better patch, please try again.

Changed 22 months ago by coolRR

comment:4 Changed 22 months ago by poswald

The point of PBKDF-2 is key-streteching. That is, to make it at least some configurable difficulty level to calculate. It does this by passing multiple rounds of of a derivation function which is currently sha256. ( https://github.com/django/django/blob/master/django/utils/crypto.py#L136)

There are many places in this function that would be sensitive to input length (such as the force_bytes call) but I have some comments regarding that:

  • I'm not convinced we should be trying to make the hasher work in constant time given any input. Ideally, it's goal is to ensure a minimum time/memory/computational cost. I think maybe it would be better to have the caller responsible to prevent overly long input from going in.
  • Does this implementation actually cause a long input to run at a consistent speed? Doesn't a call to binascii.hexlify(self.digest(force_bytes(password)).digest()) also take a long time for long input and effectively just do the same thing as the first round of PBKDF-2?
  • The point of a hashing function is to come up with a small string where each part of the whole input affects the output. This means at a minimum you have to read in the whole input and mix every byte into the hash. For this reason, I think it's not practical to expect an unbounded password length to supported.

It makes me nervous to be working on implementations of security features here before we've really nailed down what the goal is. If the goal is to avoid DOS attacks, I believe the solution that was patched in on the 1.5.4 security build (limiting input) was a pretty good implementation.

Last edited 22 months ago by poswald (previous) (diff)

comment:5 Changed 22 months ago by coolRR

Let me address your concerns.

"I'm not convinced we should be trying to make the hasher work in constant time given any input. Ideally, it's goal is to ensure a minimum time/memory/computational cost."

Making the hasher work in constant time given any input does *not* compromise the goal of ensuring minimum time/memory/computational cost.

"Does this implementation actually cause a long input to run at a consistent speed? Doesn't a call to binascii.hexlify(self.digest(force_bytes(password)).digest()) also take a long time for long input and effectively just do the same thing as the first round of PBKDF-2?"

Of course that call, which is simply one single hash iteration rather than thousands, doesn't take a long time to run, and the implementation indeed causes a long input to run at a consistent speed. Here are my test runs, time is in seconds:

    >>> t(lambda: h.encode('whateverzz'*1, 'saltysalt', 10000))
    0.1399998664855957
    >>> t(lambda: h.encode('whateverzz'*10, 'saltysalt', 10000))
    0.1359999179840088
    >>> t(lambda: h.encode('whateverzz'*100, 'saltysalt', 10000))
    0.1380000114440918
    >>> t(lambda: h.encode('whateverzz'*400, 'saltysalt', 10000))
    0.13499999046325684

"The point of a hashing function is to come up with a small string where each part of the whole input affects the output."

I don't see why you would even raise this issue in the context of this patch... But if it's not obvious: Under my patch the new hasher still depends on each and every bit of the password, no matter how long. Here I have created a 4000-byte long password, changed only one bit and the hash changed completely:

    >>> h.encode('whateverzz'*400, 'saltysalt', 10000)
    u'pbkdf2_prehash_sha256$10000$saltysalt$prehash$TdQFtlBgmsLxRbLemxnRNkstRhFyBJzCLJGZC2eFnXg='
    >>> h.encode(('whateverzz'*400)[:-1]+'y', 'saltysalt', 10000)
    u'pbkdf2_prehash_sha256$10000$saltysalt$prehash$8CpDdMU5Y77Zdt+Bx15eBuZ959ZL/Q+bkHdUrEy4eU0='

Regarding the second patch on 1.5.4: It's not as elegant as this solution. It compromises two things (Still exposing to attacks up to 4096 bytes, and limiting password length) rather than solving the root problem which is the variance of hashing time on password length.

comment:6 Changed 22 months ago by coolRR

Anyone has any other objections before merging this in? (Except writing documentation.)

comment:7 Changed 22 months ago by coolRR

I'd really like to merge this in, if the core developers agree, and if not then discuss why not.

comment:8 Changed 21 months ago by apollo13

  • Resolution set to worksforme
  • Status changed from new to closed

I don't see any improvement over https://github.com/django/django/commit/68540fe4df44492571bc610a0a043d3d02b3d320 here; hence closing this.

Note: See TracTickets for help on using tickets.
Back to Top