#14652 closed (invalid)
Sessions seem to be improperly using Pickle to hash a dictionary
Reported by: | Paul McMillan | Owned by: | nobody |
---|---|---|---|
Component: | contrib.sessions | Version: | 1.2 |
Severity: | Keywords: | ||
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Session dicts are stored as a pickle, and the integrity of that pickle is verified by a hash:
http://code.djangoproject.com/browser/django/trunk/django/contrib/sessions/backends/base.py#L91
This seems to be an improper use of pickle, since the order of dictionaries is not guaranteed. Tim Peters says:
The internals of pickle strings aren't guaranteed, just that "they work" when unpickled again, and these do. If you want a hash code for a dict, don't dare use pickle for this either, even if it appears "to work": it doesn't. The order in which dict keys are enumerated isn't defined either, and can and does vary across releases, and even across program runs.
Pickling as a SortedDict
would resolve the most direct issue, but would not prevent users from using nested dicts
improperly.
Change History (4)
comment:1 by , 14 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
follow-up: 3 comment:2 by , 14 years ago
I'm not sure I see the distinction between using a pickle directly as a unique hash vs. hashing it and using that. A differently ordered but valid pickle of the same data will still produce a different MAC, and so fail our check.
In the worst case scenario, people's sessions disappear. They're relatively ephemeral in any case.
As Tim pointed out in the quote though, the pickle value (and hence the MAC) may change across Python versions or even runs of Python. If we go to the trouble of ensuring that old sessions don't become invalidated on updating Django, it seems like those might also be circumstances we worry about. If we were using cpickle
, we would probably already have encountered this problem.
I suppose at this point it's not a bug until someone actually encounters difficulty with it. "Why are my sessions disappearing randomly" sounds pretty painful to troubleshoot though.
comment:3 by , 14 years ago
Replying to PaulM:
A differently ordered but valid pickle of the same data will still produce a different MAC
True
and so fail our check.
False, because we don't compare the pickle of the new data to the pickle of the old data (or the hashes of those pickles).
It simply does not matter that pickle doesn't necessarily produce the same results twice. As far as the MAC step is concerned, the pickled string is simply an opaque string that needs signing. The origin of that string, and what process is used to generate it, is completely irrelevant as far as HMAC is concerned. We are storing an opaque string and the HMAC of that string, and neither can 'change' - or rather the point is that if either 'changes' (by someone tampering with the data) we will know about it.
Once we've checked the integrity of our opaque string, we then go on to unpickle it. At this point, the only thing required is that pickle.loads
will correctly load the data. At this point, the pickle.dumps()
of the restored data might be completely different, but that makes no difference. We would have a problem if we had code that did this:
restored_data = pickle.loads(saved_data) if MAC(pickle.dumps(restored_data)) != saved_hash: return {} else: return restored_data
But we don't. Instead we do this:
hash, pickled = encoded_data.split(':', 1) expected_hash = self._hash(pickled) if not constant_time_compare(hash, expected_hash): raise SuspiciousOperation("Session data corrupted") else: return pickle.loads(pickled)
Our process requires the following properties:
pickle.dumps
can be used to serialise our values to a string- HMAC can be used to verify the integrity and authenticity of a string
pickle.loads
can be used to restore the value produced by 1. to the same data originally stored (where 'same' means Python equality,==
, notis
).
All of these are satisfied, and further properties of pickle are not relevant. Tim's e-mail is about a completely different process - in which someone was effectively trying to take the pickle of two values and use the pickled string to compare those values. This fails because pickle.dumps
is multi-valued in its output i.e. the same input can produce different output.
The quotation is about using pickle as a unique hash. We are not doing that - we are using pickle to pickle a dictionary, and using a MAC to MAC the pickled string. The fact that the same dictionary might be pickled to different strings doesn't affect us - we just use a MAC to check that the pickled string we have is one that we generated, then we unpickle it.