Opened 12 years ago

Closed 11 years ago

#18864 closed Bug (fixed)

Django models __eq__ and __hash__ treat all unsaved model instances as identical

Reported by: fengb Owned by: nobody
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords:
Cc: jdunck@…, Ben Davis Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When a model instance is still new and unsaved, it has no primary key.

Both eq and hash rely solely on _get_pk_val(), which means all unsaved model instances are treated as though they are identical.

b1 = models.Base()
b2 = models.Base()
b1 == b2

True

hash(b1) == hash(b2)

True

For integrity, we should add a check for Python object identity since if both instances are saved, they will end up with different PKs.

For hashing, we can use the default Python hashing to prevent large collisions based on hash(None). We would also need to cache the hash so that hash(instance) will be the same both before save and after save such that:

b = models.Base()
s = set()
s.add(b)
b.save()
b in set

True

This last case actually has an edge case that I'm not sure is solvable.

b = models.Base.object.get(id=b.id)
b in set

False

Change History (12)

comment:1 by fengb, 12 years ago

Whoops, failed formatting.

Bad behavior example:

>>> b1 = models.Base()
>>> b2 = models.Base()
>>> b1 == b2
True
>>> hash(b1) == hash(b2)
True

Hash fixed call:

>>> b = models.Base()
>>> s = set()
>>> s.add(b)
>>> b.save()
>>> b in set
True

Hash impossible example:

>>> b = models.Base.object.get(id=b.id)
>>> b in set
False

comment:3 by Anssi Kääriäinen, 12 years ago

Triage Stage: UnreviewedAccepted

I think that by caching the value of __hash__ we are just moving the problem around, not solving it. Having a hack where the model's __hash__ is based on the model's primary key, unless __hash__ was called on the same object before the object got a primary key value doesn't sound tempting.

Strictly, we should do this for immutable hash:

  1. Models without PK are not hashable.
  2. Once set, the Model pk should never be changed.

I don't think we want to go there, as this would break a lot of applications.

We should just document the existing limitation. In addition basing the eq and hashing on id(obj) when obj.pk is None is worth investigation. Alternatively, we might want to make models with pk = None unhashable. Using pk=None models in dicts or sets doesn't work anyways currently. This would leave still the problem with changed primary key value, but that doesn't seem as severe to me.

If we use the id(obj) when pk is None way, then we should document that one shall not assume the hash of an object will stay the same if the object's primary key changes. In particular this means using unsaved models in dicts or sets and then saving them is a bad idea.

Accepting this ticket as there clearly is a problem with our current implementation of __hash__ and __eq__ for unsaved models.

comment:4 by fengb, 12 years ago

Rejecting hashing on unsaved instances:

https://github.com/fengb/django/tree/model-eq-hash-rejection

There is one error with the change: forms is attempting to use hashing. I'm afraid of the other problems this backwards incompatible change will introduce.

Traceback (most recent call last):
  File "/Users/bfeng/Documents/projects/django/tests/modeltests/model_formsets/tests.py", line 523, in test_inline_formsets_save_as_new
    self.assertTrue(formset.is_valid())
  File "/Users/bfeng/Documents/projects/django/django/forms/formsets.py", line 269, in is_valid
    err = self.errors
  File "/Users/bfeng/Documents/projects/django/django/forms/formsets.py", line 247, in _get_errors
    self.full_clean()
  File "/Users/bfeng/Documents/projects/django/django/forms/formsets.py", line 293, in full_clean
    self.clean()
  File "/Users/bfeng/Documents/projects/django/django/forms/models.py", line 500, in clean
    self.validate_unique()
  File "/Users/bfeng/Documents/projects/django/django/forms/models.py", line 527, in validate_unique
    if row_data in seen_data:
  File "/Users/bfeng/Documents/projects/django/django/db/models/base.py", line 395, in __hash__
    raise TypeError('unhashable without pk')
TypeError: unhashable without pk

comment:5 by Alex Gaynor, 12 years ago

I haven't had time to review in full, but my inclination is that the behavior should be:

For saved instances:

__hash__: hash(self.pk)

__eq__: self.pk == other.pk and both objects have the same type (see the numerous other tickets discussing this issue)

For unsaved instances:

__hash__: hash(self.pk)

__eq__: self is other

comment:6 by fengb, 12 years ago

I don't agree that unsaved instances should use hash(self.pk) because self.pk will always be None and we will get hash collisions all over the place.

comment:7 by Alex Gaynor, 12 years ago

Hmm, I'm actually coming around to the idea that unsaved models shouldn't be hashable, because the pk is mutable in that situation.

comment:8 by Anssi Kääriäinen, 12 years ago

I on the other hand am beginning to think that we shouldn't ever officially support mutable primary keys. If you need mutable keys, then use a hidden AutoField for primary key, and have the logical primary key as an unique field. If you want (for legacy DBs for example), you can dodge this limitation by .update(). This will allow us to solve some issues in the composite fields feature (and in __hash__, too).

If we go with no __hash__ for pk is None route, then we do have a backwards incompatibility at our hands. Call it a bugfix?

As for the __eq__ tickets, I think we should go for "same concrete model, same pk" => equal, otherwise not, as that will be simple to implement. Implementing isinstance() based __eq__ will lead to non-transitive __eq__ and some other complications (like, the subclass could have different field as pk field, so just comparing the value will not do). See #16458 comments 13 and 17 for some weird corner cases.

comment:9 by Jeremy Dunck, 12 years ago

Cc: jdunck@… added

comment:10 by Ben Davis, 11 years ago

Cc: Ben Davis added

comment:11 by Anssi Kääriäinen, 11 years ago

Patch implementing __eq__ based id() and forbidding __hash__ in case of no pk value is available at https://github.com/akaariai/django/tree/model_eq_nopk

comment:12 by Anssi Kääriäinen <akaariai@…>, 11 years ago

Resolution: fixed
Status: newclosed

In 6af05e7a0f0e4604d6a67899acaa99d73ec0dfaa:

Fixed model.eq and hash for no pk value cases

The eq method now considers two instances without primary key value
equal only when they have same id(). The hash method raises
TypeError for no primary key case.

Fixed #18864, fixed #18250

Thanks to Tim Graham for docs review.

Note: See TracTickets for help on using tickets.
Back to Top