Opened 5 years ago
Closed 4 years ago
#31358 closed Cleanup/optimization (fixed)
Increase default password salt size in BasePasswordHasher.
Reported by: | Jon Moroney | Owned by: | nobody |
---|---|---|---|
Component: | contrib.auth | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Florian Apolloner | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I've made a patch for this here
https://github.com/django/django/pull/12553
Which changes the default salt size from ~71 bits to ~131 bits
The rational is that modern guidance suggests a 128 bit minimum on salt sizes
OWASP: https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Password_Storage_Cheat_Sheet.md#salting
Python: https://docs.python.org/3/library/hashlib.html#hashlib.pbkdf2_hmac
NIST: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-132.pdf
In the case of NIST this is technically a hard requirement.
Change History (55)
comment:1 by , 5 years ago
Description: | modified (diff) |
---|
comment:2 by , 5 years ago
Cc: | added |
---|---|
Component: | Uncategorized → Utilities |
Summary: | Increase default password salt size → Increase default password salt size. |
Type: | Uncategorized → Cleanup/optimization |
Version: | 3.0 → master |
comment:3 by , 5 years ago
In general I like the idea of just increasing get_random_string
and not doing it in X locations as needed. But I fear that a subtle change like this might break quite a few systems, so I am with Mariusz to do it just for the hasher (not every usage of get_random_string has the same security requirements).
I didn't check, but do we have tests for changing salt lengths and how the update works?
comment:4 by , 5 years ago
BTW, I think we should deprecate calling get_random_string
without a length argument, but this may be the subject of a separate ticket?
follow-up: 8 comment:5 by , 5 years ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
Summary: | Increase default password salt size. → Increase default password salt size in BasePasswordHasher. |
Triage Stage: | Unreviewed → Accepted |
Florian, it seems that it's tested only in auth_tests.test_views.ChangelistTests.
Claude, yes a separate ticket.
comment:6 by , 5 years ago
Thanks for the comments all. I've rebased on the current master and changed the return of BasePasswordHasher.salt
as suggested.
comment:7 by , 5 years ago
As an aside, would it be possible to get this back ported to the django 2.2.x branch once it's merged?
follow-up: 9 comment:8 by , 5 years ago
Replying to felixxm:
Florian, it seems that it's tested only in auth_tests.test_views.ChangelistTests.
Mhm, what does this mean for existing password hashes, will they get updated to the new salt length? I get the feeling that the module level constant CRYPTO_SALT_LENGTH
should be an attribute salt_length
on BasePasswordHasher
and must_update
should take this into account.
comment:9 by , 5 years ago
Replying to Florian Apolloner:
Replying to felixxm:
Florian, it seems that it's tested only in auth_tests.test_views.ChangelistTests.
Mhm, what does this mean for existing password hashes, will they get updated to the new salt length? I get the feeling that the module level constant
CRYPTO_SALT_LENGTH
should be an attributesalt_length
onBasePasswordHasher
andmust_update
should take this into account.
Would that change must_update
at the BasePasswordHasher
level to something like
def must_update(self, encoded): return self.salt_length != len(encoded.salt)
?
If so, would that first require an update to go out with the attribute set to 12?
Or would this be on a case by case basis for each hasher? If the later case would it not make sense to simply add a length check on each of the relevant hashers?
eg. for pbkdf2
def must_update(self, encoded): algorithm, iterations, salt, hash = encoded.split('$', 3) return int(iterations) != self.iterations || len(salt) != self.salt_length
Another edit (sorry):
I've added a commit to my PR which adds what I think the former logic should look like. Please let me know if that's the route you'd like to take and/or if the specific logic needs an update.
follow-up: 11 comment:10 by , 5 years ago
Florian, I checked builtin hashers:
BCryptSHA256PasswordHasher
,BCryptPasswordHasher
,UnsaltedSHA1PasswordHasher
,UnsaltedMD5PasswordHasher
,CryptPasswordHasher
are not affected because they overridesalt()
,
PBKDF2PasswordHasher
,PBKDF2SHA1PasswordHasher
,Argon2PasswordHasher
,SHA1PasswordHasher
, andMD5PasswordHasher
useBasePasswordHasher.salt()
.
We should introduce salt_length
attribute in a separate PR/commit and take it into account in must_update()
for affected hashers. I'm not sure how to set salt_length
for hashers that override salt()
.
comment:11 by , 5 years ago
Replying to felixxm:
We should introduce
salt_length
attribute in a separate PR/commit and take it into account inmust_update()
for affected hashers.
Ok, I am fine with that approach too.
I'm not sure how to set
salt_length
for hashers that overridesalt()
.
That is a good question indeed. For the unsalted variants we can set it to zero just fine and afaik bcrypt also defines it with a fixed length: https://github.com/pyca/bcrypt/blob/master/src/bcrypt/__init__.py#L50 and is unlikely to change. So we could set salt_length
everywhere and update the hashers to use the builtin must_update
in addition to their own.
comment:12 by , 5 years ago
Actually handling in must_update
of the base hasher might be hard, because the salt format is specific to the hasher, so we might need to add a decode function? *yikes*
follow-ups: 14 15 comment:13 by , 5 years ago
I think we can assume that safe_summary()
returns salt
and if not then salt's length is equal to 0.
comment:14 by , 5 years ago
Replying to felixxm:
I think we can assume that
safe_summary()
returnssalt
and if not then salt's length is equal to 0.
I agree. Though I think it's better to assume that salt's length is undefined if the safe_summary dict has no entry. Assuming zero will result in 0 != self.salt_length and that will always trigger.
On my branch I've had to account for two cases to pass the tests
def must_update(self, encoded): try: return len(self.safe_summary(encoded)['salt']) != self.salt_length except (KeyError, NotImplementedError): return False
One is just the lack of a salt in the dict, but the second is that not all derived classes implement safe_summary
. I think it would be useful to enforce an implementation if possible, but I'm not certain that's possible in python and that's probably a separate PR anyway.
follow-up: 16 comment:15 by , 5 years ago
Replying to felixxm:
I think we can assume that
safe_summary()
returnssalt
and if not then salt's length is equal to 0.
Yes but if it does return a salt
it doesn't neccessarily tell you anything about the length. (It wouldn't be unheard off if a hasher were to return a truncated salt there…)
comment:16 by , 5 years ago
Replying to Florian Apolloner:
Yes but if it does return a
salt
it doesn't neccessarily tell you anything about the length. (It wouldn't be unheard off if a hasher were to return a truncated salt there…)
So then is it better to force each hasher to implement it's own must_update
and move the salt_length
? Variable to the hashers?
comment:17 by , 5 years ago
That is the million dollar question :) Salt is common enough through all hashers (compared to something like memory requirements) that it is something that could be in the base hasher.
follow-up: 19 comment:18 by , 5 years ago
Let me ask the question a different way then; which route should I implement in my pr? :p
follow-up: 20 comment:19 by , 5 years ago
Replying to Jon Moroney:
Let me ask the question a different way then; which route should I implement in my pr? :p
Ha, sorry for not being clearer. What I wanted to say is that I don't have a good answer for you. In a perfect world (ie if you are up to the challenge :D) I would suggest adding a decode
function to the hashers that basically does the reverse of encode
. safe_summary
could then use the decoded values and mask them as needed.
Adding a decode
function seems to make sense since Argon2PasswordHasher
already has a _decode
and others manually repeat (the simpler logic) ala algorithm, empty, algostr, rounds, data = encoded.split('$', 4)
over multiple functions.
This new decode
functionality could be in a new PR and your current PR would be blocked by it and the use that. Interested to hear your and Mariusz' thoughts
comment:20 by , 5 years ago
Replying to Florian Apolloner:
Ha, sorry for not being clearer. What I wanted to say is that I don't have a good answer for you. In a perfect world (ie if you are up to the challenge :D) I would suggest adding a
decode
function to the hashers that basically does the reverse ofencode
.safe_summary
could then use the decoded values and mask them as needed.
Adding a
decode
function seems to make sense sinceArgon2PasswordHasher
already has a_decode
and others manually repeat (the simpler logic) alaalgorithm, empty, algostr, rounds, data = encoded.split('$', 4)
over multiple functions.
I'm not opposed to implementing a decode function, but I'm not sure I understand how it would differ from the safe_summary
function. Further if decode functionality is require/desired then is the scope of concern just for the hashers shipped with django or do we need to consider third party hashers? I have a preference for not considering them and creating a clear breaking change (but I'm also lazy :p).
On a potential decode function; This comment on the encode function worries me a bit
The result is normally formatted as "algorithm$salt$hash" and must be fewer than 128 characters.
It makes me think that the encoded result could be truncated and if we consider third party hashers then we must consider truncated DB entries. I'm not sure if this is a real issue or not, but the 128 character limit does raise an eye brow to me.
This new
decode
functionality could be in a new PR and your current PR would be blocked by it and the use that. Interested to hear your and Mariusz' thoughts
Sounds reasonable, but what does the decode function look like? It it a stub in the BasePasswordHasher
which requires that derived classes implement it with an implementation for each of the included hashers? Let me know if that sounds good and I can make a second PR to implement that. Else lets keep this conversation going :)
Edit:
For clarity my mind's eye see's the decode function as
def decode(self) -> Dict[str, str]
Where the key is in the set {"algo", "salt", "hash"}
and the values are the string encoded versions (base64 for hash?).
comment:22 by , 5 years ago
Hi, sorry for the late reply but I am swamped with work recently and lost track of this. I'll try to answer the questions as good as possible.
I'm not opposed to implementing a decode function, but I'm not sure I understand how it would differ from the safe_summary function.
The intention is simple, safe_summary
is ment for display, ie it calls mask_hash
and similar functions which make it look like a5bcd**********
(literally). A custom password hasher might even go as far as actually truncating the hash that is shown, so the data you get back from safe_summary
as of now would not contain the actual decoded data. But due to backwards concerns we cannot change that function.
clear breaking change
Yes, they will have to add this new function unless we can come up with a better idea (ie if decode is not implemented the default salt size will not increase but start raising warnings)
It makes me think that the encoded result could be truncated and if we consider third party hashers then we must consider truncated DB entries. I'm not sure if this is a real issue or not, but the 128 character limit does raise an eye brow to me.
The 128 character limit comes from the size of the database column in the db, if needed we could increase that. That said the database should not have truncated entries because any non-broken database will throw an error if you insert >128 characters into a 128 char field.
Sounds reasonable, but what does the decode function look like?
A stub (or probably not implemented at all in the base hasher for the transition period) which returns dict[str, any]. Ie the iteration count would be an integer. Good question regarding bytes vs base64, but I think it should be okay if we return the base64 values here instead of going back to the raw bytes.
follow-up: 24 comment:23 by , 5 years ago
No worries on the late reply. I know the world is a bit hectic right now :)
I've gone ahead and made a PR which adds an empty decode function to the base password hasher as well as a simple implementation to the pbkdf2 hasher.
https://github.com/django/django/pull/12675
I think it's probably better to require the decode function rather than having to deal with if it exists or not and update salt lengths only if the function does exist. I feel that having this be optional will only lead to more headaches down the line.
Let me know how you feel about this and I can update the PR to include similar decode()
s for the other hashers included.
comment:24 by , 5 years ago
Replying to Jon Moroney:
I think it's probably better to require the decode function rather than having to deal with if it exists or not and update salt lengths only if the function does exist. I feel that having this be optional will only lead to more headaches down the line.
I am not sure it is that hard, it would also help with backwards compat. Ie have a default decode method in BaseHasher
which return an empty dict and then:
- When it is time to check the salt length (ie in
must_update
), calldecode
and if there is nosalt
in it raise aPendingDeprecationWarning
(and thenDeprecationWarning
followed by an error in subsequent Django versions [ie change the method to NotImplemented]).
- We can immediately update builtin hashers with a new
decode
method that gets used as needed (safe_summary
and whereever decoding is needed). This should also allow me to finally easily upgrade Argon hashing to the "new" variant.
- This way 3rd party authors get the old salt for a while being able to update as needed. This is probably necessary since we do not can argue the salt change important enough to throw all backwards concerns over board.
Let me know how you feel about this and I can update the PR to include similar
decode()
s for the other hashers included.
Generally good, but I do not think that a decode
as used here should have translations for dictionary keys, that is solely for use in safe_summary
imo.
Edit:// Afaik we do not use typing in Django yet, so the function shouldn't have type annotations.
follow-up: 26 comment:25 by , 5 years ago
Just pushed an update for the points you've mentioned. One bit that jumps out at me is that I'm changing the behavior of must_update
by raising an exception
def must_update(self, encoded): decoded = self.decode() if 'salt' not in decoded: raise PendingDeprecationWarning('Decode not fully implemented. Decode will be required in future releases.') return False
Also out of curiosity. Why no typing?
follow-up: 28 comment:26 by , 5 years ago
Replying to Jon Moroney:
Just pushed an update for the points you've mentioned.
Great, will look at it this week.
One bit that jumps out at me is that I'm changing the behavior of
must_update
by raising an exception
Don't raise it but use warnings.warn
in conjunction with RemovedInDjango40Warning
(see the existing deprecation warnings in Django).
Also out of curiosity. Why no typing?
Because we don't have any yet. As for why that is the case please look into the existing mailing list threads :)
comment:27 by , 5 years ago
I've looked at the updates in your PR, and yes that is pretty much what I had in mind.
comment:28 by , 5 years ago
Updated again to change the raise to a usage of the warning function and to add decode to the remaining hashers.
Because we don't have any yet. As for why that is the case please look into the existing mailing list threads :)
That's beyond the scope of what I'm trying to get done :p
It would be nice to see you guys adopt typing at some point though :)
comment:29 by , 5 years ago
The doc test failing was temporary, unlinked to your patch and should now be solved.
follow-up: 31 comment:30 by , 5 years ago
Updated again to change the raise to a usage of the warning function and to add decode to the remaining hashers.
Ok, next step would be to use those new decode
methods in the existing cases where manual decoding happens (to reduce the duplication added by this method).
Then the next step would be to look into where and how we can implement the update of the salt. Preferably without adding it to the must_update
individually. We could add a salt_length
attribute and then adjust based on that (which could be None for the unsalted variants).
comment:31 by , 5 years ago
Ok, next step would be to use those new
decode
methods in the existing cases where manual decoding happens (to reduce the duplication added by this method).
I've updated the PR with what I think you mean. Let me know if that's what you're thinking and I'll do the rest, else let me know what I misunderstood :)
Then the next step would be to look into where and how we can implement the update of the salt. Preferably without adding it to the
must_update
individually. We could add asalt_length
attribute and then adjust based on that (which could be None for the unsalted variants).
I think this can be handled in the BasePasswordHasher unless a hasher implements its own must_update
logic and in that case it must be on a case by case basis. As for the salt length do you propose adding a salt_length
field to the return from decode
? I think it may just be easier to use len(salt)
and handle the case where salt is None
.
follow-up: 33 comment:32 by , 5 years ago
I've updated the PR with what I think you mean. Let me know if that's what you're thinking and I'll do the rest, else let me know what I misunderstood :)
Yes that is what I ment.
I think this can be handled in the BasePasswordHasher unless a hasher implements its own must_update logic and in that case it must be on a case by case basis.
Well the must_update
methods in the hashers could call the base method were appropriate. Ie instead of returning False
when the hash thinks it doesn't need an update it can call super().must_update
I think it may just be easier to use len(salt) and handle the case where salt is None.
Yes
comment:33 by , 5 years ago
Well the
must_update
methods in the hashers could call the base method were appropriate. Ie instead of returningFalse
when the hash thinks it doesn't need an update it can call super().must_update
In essence that makes for a two stage check I think. Should it just be convention for a hasher to return super().must_update
rather than ever returning false?
After a bit of a struggle with the bcrypt hasher I've squashed the PR down. It looks like bcrypt is using more than just the salt stored data as a salt in hash verification. I've added a bit more logic to handle matching what was used, but if you wouldn't mind giving it a second look that would be helpful :) I've left the relevant changes in the most recent commit on the PR.
comment:36 by , 5 years ago
Hi Jon. There are more than 200 open PRs. We'll get to this, but I'm afraid you need to be patient. Constant pings just add noise.
A quick look at the PR suggests going over the Patch review checklist would save time later. Please uncheck Patch needs improvement when that's done.
Thanks for your input!
comment:37 by , 5 years ago
Patch needs improvement: | unset |
---|
Sorry about the noise. I took a look through the checklist and it seems good.
comment:38 by , 5 years ago
Sorry to bug again, but it's been nearly 2 months since I last had any feedback on this ticket. I'd like to get this through and I believe that the first of the two PRs
https://github.com/django/django/pull/12675
is ready to go. If you guys disagree I'm happy to change anything, but I need feedback for that.
comment:39 by , 5 years ago
Needs documentation: | set |
---|---|
Patch needs improvement: | set |
comment:40 by , 5 years ago
Given that https://github.com/django/django/pull/12675 is nearing completion (I expect it to get merged soon), it is time to think about this again. I had time to refresh my memory on how the salts work for the algos in question. I think it would be a good idea to define the salt_length in actual terms of entropy and not the length of the resulting string.
To that extend I think the default salt
function should change to something along the lines of:
salt_len = 71 # information entropy in bits def salt(self): """Generate a cryptographically secure nonce salt in ASCII.""" char_count = math.ceil(math.log10(math.pow(2, self.salt_len)) / math.log10(62)) return get_random_string(char_count)
At this point I'd probably change the encode()
function to accept an empty salt and let the hashers generate the salt themselves if needed (argon2 for instance could do well like this and passing a salt to bcrypt makes no sense either). This way we would also get rid of the weirdness of bytes vs ASCII in the salt string and could pass the output of os.urandom(some_length)
to the algorithms directly -- although that will probably be not as easy since we do have to be able to translate the existing salt strings *scratches head*.
comment:41 by , 5 years ago
I agree with the idea of having the salt function view the salt_len
in terms of bits of entropy, but I'm unclear on the encoding idea. Aren't the password entries stored as long strings and wouldn't that force any bytes to be coerced back into ascii least we break decoding of entries on retrieval?
follow-up: 43 comment:42 by , 5 years ago
The problem is that bytes don't map to ASCII (if we are talking about the output of os.urandom(some_length)
at least), so we need to define some "encoding" -- base64 is probably something that would make sense but not really compatible with what we have now :/
comment:43 by , 5 years ago
Ahh sorry. I read this
At this point I'd probably change the encode() function to...
as this
At this point I'd probably change the salt() function to...
That's what I get for not having my coffee before getting to work :(
Replying to Florian Apolloner:
base64 is probably something that would make sense but not really compatible with what we have now :/
Certainly something for the future :)
So, the easy next step is to change salt
to work in terms of bits rather than in terms of number of characters which returns a string of some length.
On the encoding function. How much flexibility is there in terms of hash storage? Can a given hasher define a binary encoding for instance?
comment:45 by , 4 years ago
I've made a new PR to convert the salt function to work in bits rather than in character length. I've also set the entropy value to 128 bits.
https://github.com/django/django/pull/13107
Edit: As per PR comments the above PR has been scrapped with the work from it moved to https://github.com/django/django/pull/12553
comment:46 by , 4 years ago
To circle back on this and to document the state of things for future readers. The current PR here
https://github.com/django/django/pull/12553
Changes the measure of salt from characters to bits and from ~71 bits to 128 bits.
The PR is ready but is hinging on the question of updating prior database entries which have a smaller salt than the 128bit value.
comment:47 by , 4 years ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | unset |
Updated flags based on PR comment
https://github.com/django/django/pull/12553#pullrequestreview-520869817
comment:51 by , 4 years ago
Component: | Utilities → contrib.auth |
---|
comment:53 by , 4 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
I'm not sure.This method is not strictly to generate a salt. I would rather change a
BasePasswordHasher.salt()
to returnget_random_string(22)
.