Opened 14 years ago

Closed 14 years ago

Last modified 13 years ago

#14652 closed (invalid)

Sessions seem to be improperly using Pickle to hash a dictionary

Reported by: Paul McMillan Owned by: nobody
Component: contrib.sessions Version: 1.2
Severity: Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Session dicts are stored as a pickle, and the integrity of that pickle is verified by a hash:

http://code.djangoproject.com/browser/django/trunk/django/contrib/sessions/backends/base.py#L91

This seems to be an improper use of pickle, since the order of dictionaries is not guaranteed. Tim Peters says:

The internals of pickle strings aren't 
guaranteed, just that "they work" when unpickled again, and 
these do.  If you want a hash code for a dict, don't dare use 
pickle for this either, even if it appears "to work":  it doesn't.  
The order in which dict keys are enumerated isn't defined 
either, and can and does vary across releases, and even 
across program runs.

Pickling as a SortedDict would resolve the most direct issue, but would not prevent users from using nested dicts improperly.

Change History (4)

comment:1 by Luke Plant, 14 years ago

Resolution: invalid
Status: newclosed

The quotation is about using pickle as a unique hash. We are not doing that - we are using pickle to pickle a dictionary, and using a MAC to MAC the pickled string. The fact that the same dictionary might be pickled to different strings doesn't affect us - we just use a MAC to check that the pickled string we have is one that we generated, then we unpickle it.

comment:2 by Paul McMillan, 14 years ago

I'm not sure I see the distinction between using a pickle directly as a unique hash vs. hashing it and using that. A differently ordered but valid pickle of the same data will still produce a different MAC, and so fail our check.

In the worst case scenario, people's sessions disappear. They're relatively ephemeral in any case.

As Tim pointed out in the quote though, the pickle value (and hence the MAC) may change across Python versions or even runs of Python. If we go to the trouble of ensuring that old sessions don't become invalidated on updating Django, it seems like those might also be circumstances we worry about. If we were using cpickle, we would probably already have encountered this problem.

I suppose at this point it's not a bug until someone actually encounters difficulty with it. "Why are my sessions disappearing randomly" sounds pretty painful to troubleshoot though.

in reply to:  2 comment:3 by Luke Plant, 14 years ago

Replying to PaulM:

A differently ordered but valid pickle of the same data will still produce a different MAC

True

and so fail our check.

False, because we don't compare the pickle of the new data to the pickle of the old data (or the hashes of those pickles).

It simply does not matter that pickle doesn't necessarily produce the same results twice. As far as the MAC step is concerned, the pickled string is simply an opaque string that needs signing. The origin of that string, and what process is used to generate it, is completely irrelevant as far as HMAC is concerned. We are storing an opaque string and the HMAC of that string, and neither can 'change' - or rather the point is that if either 'changes' (by someone tampering with the data) we will know about it.

Once we've checked the integrity of our opaque string, we then go on to unpickle it. At this point, the only thing required is that pickle.loads will correctly load the data. At this point, the pickle.dumps() of the restored data might be completely different, but that makes no difference. We would have a problem if we had code that did this:

restored_data = pickle.loads(saved_data)
if MAC(pickle.dumps(restored_data)) != saved_hash:
    return {}
else:
    return restored_data

But we don't. Instead we do this:

            hash, pickled = encoded_data.split(':', 1)
            expected_hash = self._hash(pickled)
            if not constant_time_compare(hash, expected_hash):
                raise SuspiciousOperation("Session data corrupted")
            else:
                return pickle.loads(pickled)

Our process requires the following properties:

  1. pickle.dumps can be used to serialise our values to a string
  2. HMAC can be used to verify the integrity and authenticity of a string
  3. pickle.loads can be used to restore the value produced by 1. to the same data originally stored (where 'same' means Python equality, ==, not is).

All of these are satisfied, and further properties of pickle are not relevant. Tim's e-mail is about a completely different process - in which someone was effectively trying to take the pickle of two values and use the pickled string to compare those values. This fails because pickle.dumps is multi-valued in its output i.e. the same input can produce different output.

comment:4 by Jacob, 13 years ago

milestone: 1.3

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top