Opened 5 months ago

Last modified 3 weeks ago

#31358 new Cleanup/optimization

Increase default password salt size in BasePasswordHasher.

Reported by: Jon Moroney Owned by: nobody
Component: Utilities Version: master
Severity: Normal Keywords:
Cc: Florian Apolloner Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Jon Moroney)

I've made a patch for this here
https://github.com/django/django/pull/12553
Which changes the default salt size from ~71 bits to ~131 bits

The rational is that modern guidance suggests a 128 bit minimum on salt sizes
OWASP: https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Password_Storage_Cheat_Sheet.md#salting
Python: https://docs.python.org/3/library/hashlib.html#hashlib.pbkdf2_hmac
NIST: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-132.pdf

In the case of NIST this is technically a hard requirement.

Change History (46)

comment:1 Changed 5 months ago by Jon Moroney

Description: modified (diff)

comment:2 Changed 5 months ago by felixxm

Cc: Florian Apolloner added
Component: UncategorizedUtilities
Summary: Increase default password salt sizeIncrease default password salt size.
Type: UncategorizedCleanup/optimization
Version: 3.0master

I'm not sure.This method is not strictly to generate a salt. I would rather change a BasePasswordHasher.salt() to return get_random_string(22).

comment:3 Changed 5 months ago by Florian Apolloner

In general I like the idea of just increasing get_random_string and not doing it in X locations as needed. But I fear that a subtle change like this might break quite a few systems, so I am with Mariusz to do it just for the hasher (not every usage of get_random_string has the same security requirements).

I didn't check, but do we have tests for changing salt lengths and how the update works?

comment:4 Changed 5 months ago by Claude Paroz

BTW, I think we should deprecate calling get_random_string without a length argument, but this may be the subject of a separate ticket?

comment:5 Changed 5 months ago by felixxm

Has patch: set
Patch needs improvement: set
Summary: Increase default password salt size.Increase default password salt size in BasePasswordHasher.
Triage Stage: UnreviewedAccepted

Florian, it seems that it's tested only in auth_tests.test_views.ChangelistTests.

Claude, yes a separate ticket.

comment:6 Changed 5 months ago by Jon Moroney

Thanks for the comments all. I've rebased on the current master and changed the return of BasePasswordHasher.salt as suggested.

comment:7 Changed 5 months ago by Jon Moroney

As an aside, would it be possible to get this back ported to the django 2.2.x branch once it's merged?

Last edited 5 months ago by Jon Moroney (previous) (diff)

comment:8 in reply to:  5 ; Changed 5 months ago by Florian Apolloner

Replying to felixxm:

Florian, it seems that it's tested only in auth_tests.test_views.ChangelistTests.

Mhm, what does this mean for existing password hashes, will they get updated to the new salt length? I get the feeling that the module level constant CRYPTO_SALT_LENGTH should be an attribute salt_length on BasePasswordHasher and must_update should take this into account.

comment:9 in reply to:  8 Changed 5 months ago by Jon Moroney

Replying to Florian Apolloner:

Replying to felixxm:

Florian, it seems that it's tested only in auth_tests.test_views.ChangelistTests.

Mhm, what does this mean for existing password hashes, will they get updated to the new salt length? I get the feeling that the module level constant CRYPTO_SALT_LENGTH should be an attribute salt_length on BasePasswordHasher and must_update should take this into account.

Would that change must_update at the BasePasswordHasher level to something like

    def must_update(self, encoded):
        return self.salt_length != len(encoded.salt)

?
If so, would that first require an update to go out with the attribute set to 12?

Or would this be on a case by case basis for each hasher? If the later case would it not make sense to simply add a length check on each of the relevant hashers?
eg. for pbkdf2

    def must_update(self, encoded):
        algorithm, iterations, salt, hash = encoded.split('$', 3)
        return int(iterations) != self.iterations || len(salt) != self.salt_length

Another edit (sorry):
I've added a commit to my PR which adds what I think the former logic should look like. Please let me know if that's the route you'd like to take and/or if the specific logic needs an update.

Last edited 5 months ago by Jon Moroney (previous) (diff)

comment:10 Changed 5 months ago by felixxm

Florian, I checked builtin hashers:

  • BCryptSHA256PasswordHasher, BCryptPasswordHasher, UnsaltedSHA1PasswordHasher, UnsaltedMD5PasswordHasher, CryptPasswordHasher are not affected because they override salt(),
  • PBKDF2PasswordHasher, PBKDF2SHA1PasswordHasher, Argon2PasswordHasher, SHA1PasswordHasher, and MD5PasswordHasher use BasePasswordHasher.salt().

We should introduce salt_length attribute in a separate PR/commit and take it into account in must_update() for affected hashers. I'm not sure how to set salt_length for hashers that override salt().

comment:11 in reply to:  10 Changed 5 months ago by Florian Apolloner

Replying to felixxm:

We should introduce salt_length attribute in a separate PR/commit and take it into account in must_update() for affected hashers.

Ok, I am fine with that approach too.

I'm not sure how to set salt_length for hashers that override salt().

That is a good question indeed. For the unsalted variants we can set it to zero just fine and afaik bcrypt also defines it with a fixed length: https://github.com/pyca/bcrypt/blob/master/src/bcrypt/__init__.py#L50 and is unlikely to change. So we could set salt_length everywhere and update the hashers to use the builtin must_update in addition to their own.

comment:12 Changed 5 months ago by Florian Apolloner

Actually handling in must_update of the base hasher might be hard, because the salt format is specific to the hasher, so we might need to add a decode function? *yikes*

comment:13 Changed 5 months ago by felixxm

I think we can assume that safe_summary() returns salt and if not then salt's length is equal to 0.

comment:14 in reply to:  13 Changed 5 months ago by Jon Moroney

Replying to felixxm:

I think we can assume that safe_summary() returns salt and if not then salt's length is equal to 0.

I agree. Though I think it's better to assume that salt's length is undefined if the safe_summary dict has no entry. Assuming zero will result in 0 != self.salt_length and that will always trigger.
On my branch I've had to account for two cases to pass the tests

    def must_update(self, encoded):
        try:
            return len(self.safe_summary(encoded)['salt']) != self.salt_length
        except (KeyError, NotImplementedError):
            return False

One is just the lack of a salt in the dict, but the second is that not all derived classes implement safe_summary. I think it would be useful to enforce an implementation if possible, but I'm not certain that's possible in python and that's probably a separate PR anyway.

Last edited 5 months ago by Jon Moroney (previous) (diff)

comment:15 in reply to:  13 ; Changed 5 months ago by Florian Apolloner

Replying to felixxm:

I think we can assume that safe_summary() returns salt and if not then salt's length is equal to 0.

Yes but if it does return a salt it doesn't neccessarily tell you anything about the length. (It wouldn't be unheard off if a hasher were to return a truncated salt there…)

comment:16 in reply to:  15 Changed 5 months ago by Jon Moroney

Replying to Florian Apolloner:

Yes but if it does return a salt it doesn't neccessarily tell you anything about the length. (It wouldn't be unheard off if a hasher were to return a truncated salt there…)

So then is it better to force each hasher to implement it's own must_update and move the salt_length? Variable to the hashers?

comment:17 Changed 5 months ago by Florian Apolloner

That is the million dollar question :) Salt is common enough through all hashers (compared to something like memory requirements) that it is something that could be in the base hasher.

comment:18 Changed 5 months ago by Jon Moroney

Let me ask the question a different way then; which route should I implement in my pr? :p

comment:19 in reply to:  18 ; Changed 5 months ago by Florian Apolloner

Replying to Jon Moroney:

Let me ask the question a different way then; which route should I implement in my pr? :p

Ha, sorry for not being clearer. What I wanted to say is that I don't have a good answer for you. In a perfect world (ie if you are up to the challenge :D) I would suggest adding a decode function to the hashers that basically does the reverse of encode. safe_summary could then use the decoded values and mask them as needed.

Adding a decode function seems to make sense since Argon2PasswordHasher already has a _decode and others manually repeat (the simpler logic) ala algorithm, empty, algostr, rounds, data = encoded.split('$', 4) over multiple functions.

This new decode functionality could be in a new PR and your current PR would be blocked by it and the use that. Interested to hear your and Mariusz' thoughts

comment:20 in reply to:  19 Changed 5 months ago by Jon Moroney

Replying to Florian Apolloner:

Ha, sorry for not being clearer. What I wanted to say is that I don't have a good answer for you. In a perfect world (ie if you are up to the challenge :D) I would suggest adding a decode function to the hashers that basically does the reverse of encode. safe_summary could then use the decoded values and mask them as needed.

Adding a decode function seems to make sense since Argon2PasswordHasher already has a _decode and others manually repeat (the simpler logic) ala algorithm, empty, algostr, rounds, data = encoded.split('$', 4) over multiple functions.

I'm not opposed to implementing a decode function, but I'm not sure I understand how it would differ from the safe_summary function. Further if decode functionality is require/desired then is the scope of concern just for the hashers shipped with django or do we need to consider third party hashers? I have a preference for not considering them and creating a clear breaking change (but I'm also lazy :p).

On a potential decode function; This comment on the encode function worries me a bit

The result is normally formatted as "algorithm$salt$hash" and must be fewer than 128 characters.

It makes me think that the encoded result could be truncated and if we consider third party hashers then we must consider truncated DB entries. I'm not sure if this is a real issue or not, but the 128 character limit does raise an eye brow to me.

This new decode functionality could be in a new PR and your current PR would be blocked by it and the use that. Interested to hear your and Mariusz' thoughts

Sounds reasonable, but what does the decode function look like? It it a stub in the BasePasswordHasher which requires that derived classes implement it with an implementation for each of the included hashers? Let me know if that sounds good and I can make a second PR to implement that. Else lets keep this conversation going :)

Edit:
For clarity my mind's eye see's the decode function as
def decode(self) -> Dict[str, str]
Where the key is in the set {"algo", "salt", "hash"} and the values are the string encoded versions (base64 for hash?).

Last edited 5 months ago by Jon Moroney (previous) (diff)

comment:21 Changed 4 months ago by Jon Moroney

Hey, just wanted to drop by and check up on this.

comment:22 Changed 4 months ago by Florian Apolloner

Hi, sorry for the late reply but I am swamped with work recently and lost track of this. I'll try to answer the questions as good as possible.

I'm not opposed to implementing a decode function, but I'm not sure I understand how it would differ from the safe_summary function.

The intention is simple, safe_summary is ment for display, ie it calls mask_hash and similar functions which make it look like a5bcd********** (literally). A custom password hasher might even go as far as actually truncating the hash that is shown, so the data you get back from safe_summary as of now would not contain the actual decoded data. But due to backwards concerns we cannot change that function.

clear breaking change

Yes, they will have to add this new function unless we can come up with a better idea (ie if decode is not implemented the default salt size will not increase but start raising warnings)

It makes me think that the encoded result could be truncated and if we consider third party hashers then we must consider truncated DB entries. I'm not sure if this is a real issue or not, but the 128 character limit does raise an eye brow to me.

The 128 character limit comes from the size of the database column in the db, if needed we could increase that. That said the database should not have truncated entries because any non-broken database will throw an error if you insert >128 characters into a 128 char field.

Sounds reasonable, but what does the decode function look like?

A stub (or probably not implemented at all in the base hasher for the transition period) which returns dict[str, any]. Ie the iteration count would be an integer. Good question regarding bytes vs base64, but I think it should be okay if we return the base64 values here instead of going back to the raw bytes.

comment:23 Changed 4 months ago by Jon Moroney

No worries on the late reply. I know the world is a bit hectic right now :)

I've gone ahead and made a PR which adds an empty decode function to the base password hasher as well as a simple implementation to the pbkdf2 hasher.

https://github.com/django/django/pull/12675

I think it's probably better to require the decode function rather than having to deal with if it exists or not and update salt lengths only if the function does exist. I feel that having this be optional will only lead to more headaches down the line.

Let me know how you feel about this and I can update the PR to include similar decode()s for the other hashers included.

comment:24 in reply to:  23 Changed 4 months ago by Florian Apolloner

Replying to Jon Moroney:

I think it's probably better to require the decode function rather than having to deal with if it exists or not and update salt lengths only if the function does exist. I feel that having this be optional will only lead to more headaches down the line.

I am not sure it is that hard, it would also help with backwards compat. Ie have a default decode method in BaseHasher which return an empty dict and then:

  • When it is time to check the salt length (ie in must_update), call decode and if there is no salt in it raise a PendingDeprecationWarning (and then DeprecationWarning followed by an error in subsequent Django versions [ie change the method to NotImplemented]).
  • We can immediately update builtin hashers with a new decode method that gets used as needed (safe_summary and whereever decoding is needed). This should also allow me to finally easily upgrade Argon hashing to the "new" variant.
  • This way 3rd party authors get the old salt for a while being able to update as needed. This is probably necessary since we do not can argue the salt change important enough to throw all backwards concerns over board.

Let me know how you feel about this and I can update the PR to include similar decode()s for the other hashers included.

Generally good, but I do not think that a decode as used here should have translations for dictionary keys, that is solely for use in safe_summary imo.

Edit:// Afaik we do not use typing in Django yet, so the function shouldn't have type annotations.

Last edited 4 months ago by Florian Apolloner (previous) (diff)

comment:25 Changed 4 months ago by Jon Moroney

Just pushed an update for the points you've mentioned. One bit that jumps out at me is that I'm changing the behavior of must_update by raising an exception

    def must_update(self, encoded):
        decoded = self.decode()
        if 'salt' not in decoded:
            raise PendingDeprecationWarning('Decode not fully implemented. Decode will be required in future releases.')
        return False

Also out of curiosity. Why no typing?

comment:26 in reply to:  25 ; Changed 4 months ago by Florian Apolloner

Replying to Jon Moroney:

Just pushed an update for the points you've mentioned.

Great, will look at it this week.

One bit that jumps out at me is that I'm changing the behavior of must_update by raising an exception

Don't raise it but use warnings.warn in conjunction with RemovedInDjango40Warning (see the existing deprecation warnings in Django).

Also out of curiosity. Why no typing?

Because we don't have any yet. As for why that is the case please look into the existing mailing list threads :)

comment:27 Changed 4 months ago by Florian Apolloner

I've looked at the updates in your PR, and yes that is pretty much what I had in mind.

comment:28 in reply to:  26 Changed 4 months ago by Jon Moroney

Updated again to change the raise to a usage of the warning function and to add decode to the remaining hashers.

Because we don't have any yet. As for why that is the case please look into the existing mailing list threads :)

That's beyond the scope of what I'm trying to get done :p
It would be nice to see you guys adopt typing at some point though :)

Edit:
No idea what's up with the doc test failing.

Last edited 4 months ago by Jon Moroney (previous) (diff)

comment:29 Changed 4 months ago by Claude Paroz

The doc test failing was temporary, unlinked to your patch and should now be solved.

comment:30 Changed 4 months ago by Florian Apolloner

Updated again to change the raise to a usage of the warning function and to add decode to the remaining hashers.

Ok, next step would be to use those new decode methods in the existing cases where manual decoding happens (to reduce the duplication added by this method).

Then the next step would be to look into where and how we can implement the update of the salt. Preferably without adding it to the must_update individually. We could add a salt_length attribute and then adjust based on that (which could be None for the unsalted variants).

comment:31 in reply to:  30 Changed 4 months ago by Jon Moroney

Ok, next step would be to use those new decode methods in the existing cases where manual decoding happens (to reduce the duplication added by this method).

I've updated the PR with what I think you mean. Let me know if that's what you're thinking and I'll do the rest, else let me know what I misunderstood :)

Then the next step would be to look into where and how we can implement the update of the salt. Preferably without adding it to the must_update individually. We could add a salt_length attribute and then adjust based on that (which could be None for the unsalted variants).

I think this can be handled in the BasePasswordHasher unless a hasher implements its own must_update logic and in that case it must be on a case by case basis. As for the salt length do you propose adding a salt_length field to the return from decode? I think it may just be easier to use len(salt) and handle the case where salt is None.

Last edited 4 months ago by Jon Moroney (previous) (diff)

comment:32 Changed 4 months ago by Florian Apolloner

I've updated the PR with what I think you mean. Let me know if that's what you're thinking and I'll do the rest, else let me know what I misunderstood :)

Yes that is what I ment.

I think this can be handled in the BasePasswordHasher unless a hasher implements its own must_update logic and in that case it must be on a case by case basis.

Well the must_update methods in the hashers could call the base method were appropriate. Ie instead of returning False when the hash thinks it doesn't need an update it can call super().must_update

I think it may just be easier to use len(salt) and handle the case where salt is None.

Yes

comment:33 in reply to:  32 Changed 4 months ago by Jon Moroney

Well the must_update methods in the hashers could call the base method were appropriate. Ie instead of returning False when the hash thinks it doesn't need an update it can call super().must_update

In essence that makes for a two stage check I think. Should it just be convention for a hasher to return super().must_update rather than ever returning false?

After a bit of a struggle with the bcrypt hasher I've squashed the PR down. It looks like bcrypt is using more than just the salt stored data as a salt in hash verification. I've added a bit more logic to handle matching what was used, but if you wouldn't mind giving it a second look that would be helpful :) I've left the relevant changes in the most recent commit on the PR.

Last edited 4 months ago by Jon Moroney (previous) (diff)

comment:34 Changed 3 months ago by Jon Moroney

Checking in again just to make sure this doesn't get lost.

comment:35 Changed 3 months ago by Jon Moroney

Any updates?

comment:36 Changed 3 months ago by Carlton Gibson

Hi Jon. There are more than 200 open PRs. We'll get to this, but I'm afraid you need to be patient. Constant pings just add noise.

A quick look at the PR suggests going over the Patch review checklist would save time later. Please uncheck Patch needs improvement when that's done.

Thanks for your input!

comment:37 Changed 3 months ago by Jon Moroney

Patch needs improvement: unset

Sorry about the noise. I took a look through the checklist and it seems good.

comment:38 Changed 2 months ago by Jon Moroney

Sorry to bug again, but it's been nearly 2 months since I last had any feedback on this ticket. I'd like to get this through and I believe that the first of the two PRs
https://github.com/django/django/pull/12675
is ready to go. If you guys disagree I'm happy to change anything, but I need feedback for that.

comment:39 Changed 8 weeks ago by felixxm

Needs documentation: set
Patch needs improvement: set

comment:40 Changed 7 weeks ago by Florian Apolloner

Given that https://github.com/django/django/pull/12675 is nearing completion (I expect it to get merged soon), it is time to think about this again. I had time to refresh my memory on how the salts work for the algos in question. I think it would be a good idea to define the salt_length in actual terms of entropy and not the length of the resulting string.

To that extend I think the default salt function should change to something along the lines of:

    salt_len = 71 # information entropy in bits

    def salt(self):
        """Generate a cryptographically secure nonce salt in ASCII."""
        char_count = math.ceil(math.log10(math.pow(2, self.salt_len)) / math.log10(62))
        return get_random_string(char_count)

At this point I'd probably change the encode() function to accept an empty salt and let the hashers generate the salt themselves if needed (argon2 for instance could do well like this and passing a salt to bcrypt makes no sense either). This way we would also get rid of the weirdness of bytes vs ASCII in the salt string and could pass the output of os.urandom(some_length) to the algorithms directly -- although that will probably be not as easy since we do have to be able to translate the existing salt strings *scratches head*.

comment:41 Changed 7 weeks ago by Jon Moroney

I agree with the idea of having the salt function view the salt_len in terms of bits of entropy, but I'm unclear on the encoding idea. Aren't the password entries stored as long strings and wouldn't that force any bytes to be coerced back into ascii least we break decoding of entries on retrieval?

comment:42 Changed 7 weeks ago by Florian Apolloner

The problem is that bytes don't map to ASCII (if we are talking about the output of os.urandom(some_length) at least), so we need to define some "encoding" -- base64 is probably something that would make sense but not really compatible with what we have now :/

comment:43 in reply to:  42 Changed 7 weeks ago by Jon Moroney

Ahh sorry. I read this

At this point I'd probably change the encode() function to...

as this

At this point I'd probably change the salt() function to...

That's what I get for not having my coffee before getting to work :(

Replying to Florian Apolloner:

base64 is probably something that would make sense but not really compatible with what we have now :/

Certainly something for the future :)

So, the easy next step is to change salt to work in terms of bits rather than in terms of number of characters which returns a string of some length.
On the encoding function. How much flexibility is there in terms of hash storage? Can a given hasher define a binary encoding for instance?

comment:44 Changed 7 weeks ago by Mariusz Felisiak <felisiak.mariusz@…>

In 136ec9b6:

Refs #31358 -- Added decode() to password hashers.

By convention a hasher which does not use a salt should populate the
decode dict with None rather than omit the dict key.

Co-Authored-By: Florian Apolloner <apollo13@…>

comment:45 Changed 6 weeks ago by Jon Moroney

I've made a new PR to convert the salt function to work in bits rather than in character length. I've also set the entropy value to 128 bits.

https://github.com/django/django/pull/13107

Edit: As per PR comments the above PR has been scrapped with the work from it moved to https://github.com/django/django/pull/12553

Last edited 6 weeks ago by Jon Moroney (previous) (diff)

comment:46 Changed 3 weeks ago by Jon Moroney

To circle back on this and to document the state of things for future readers. The current PR here
https://github.com/django/django/pull/12553
Changes the measure of salt from characters to bits and from ~71 bits to 128 bits.
The PR is ready but is hinging on the question of updating prior database entries which have a smaller salt than the 128bit value.

Note: See TracTickets for help on using tickets.
Back to Top