#19634 closed Bug (fixed)
Broken __hash__ methods
| Reported by: | Anssi Kääriäinen | Owned by: | Aymeric Augustin |
|---|---|---|---|
| Component: | Python 3 | Version: | dev |
| Severity: | Normal | Keywords: | sprint2013 |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
We have some classes which defined __eq__ but not __hash__ before the py3 support. Python 3 doesn't allow these objects to be used in dicts, so the fix was to introduce:
__hash__ = object.__hash__
However, this is doing the wrong thing. The Python docs say that the only requirement is that objects which compare equal have the same hash value. However, if we define __eq__ so that it makes object with different id() values equal, then object.__hash__ is failing this requirement.
This isn't a regression - we have done the same thing implicitly in python 2 for ages. However, py3 introduced the requirement of defining __hash__ exactly to avoid this situation, so we should consider fixing the __hash__ functions somehow.
I am not sure what the implications of abusing object.__hash__ are.
The problematic cases seem to be:
> git grep '__hash__ = obj' django/db/backends/__init__.py: __hash__ = object.__hash__ django/db/models/fields/__init__.py: __hash__ = object.__hash__ django/dispatch/saferef.py: __hash__ = object.__hash__ django/test/html.py: __hash__ = object.__hash__ django/utils/functional.py: __hash__ = object.__hash__
Attachments (1)
Change History (10)
comment:1 by , 13 years ago
| Component: | Uncategorized → Python 3 |
|---|---|
| Triage Stage: | Unreviewed → Accepted |
comment:2 by , 13 years ago
| Keywords: | sprint2013 added |
|---|---|
| Owner: | changed from to |
| Status: | new → assigned |
by , 13 years ago
| Attachment: | #19634-broken_hash_methods.diff added |
|---|
follow-up: 5 comment:3 by , 13 years ago
A patch that fixes most of these attached.
When it comes to BaseDatabaseWrapper, it's clear that you can have several DatabaseWrappers that has the same parameters but are different connections, so you can't implement hash in a way that won't break tests. That opens the question if there really should be a eq, but that's a different question.
comment:4 by , 13 years ago
| Owner: | changed from to |
|---|
This looks interesting. I'll review that post-sprint.
comment:5 by , 13 years ago
Replying to regebro:
When it comes to BaseDatabaseWrapper, it's clear that you can have several DatabaseWrappers that has the same parameters but are different connections, so you can't implement hash in a way that won't break tests. That opens the question if there really should be a eq, but that's a different question.
Its __eq__ method compares the alias attribute, which is the key in the DATABASES dictionary. Parameters aren't involved at all. What's wrong with returning hash(self.alias)?
comment:6 by , 13 years ago
I found the answer to my question after running the tests:
======================================================================
FAIL: test_connections_thread_local (regressiontests.backends.tests.ThreadTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/myk/Documents/dev/django/tests/regressiontests/backends/tests.py", line 626, in test_connections_thread_local
self.assertEqual(len(connections_set), 6)
AssertionError: 2 != 6
======================================================================
FAIL: test_default_connection_thread_local (regressiontests.backends.tests.ThreadTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/myk/Documents/dev/django/tests/regressiontests/backends/tests.py", line 599, in test_default_connection_thread_local
3)
AssertionError: 2 != 3
I think these tests are poorly written and should be updated.
comment:7 by , 13 years ago
Yeah, rewriting those tests would be a good idea, they do cause problems somewhat often.
comment:8 by , 13 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
When "cheating" with the hash in this way values that should evaluate as equal will not do so when checking in dictionaries and sets:
>>> class FakeHash(object): ... def __init__(self, value): ... self.value = value ... def __eq__(self, o): ... return self.value == o.value ... __hash__ = object.__hash__ ... >>> f1 = FakeHash(5) >>> f2 = FakeHash(5) >>> f1 == f2 True >>> d = {f1: 'value'} >>> f2 in d False >>> s = {f1} >>> f2 in s FalseIt will work in tuples and lists though.
Implementing a real hash in most cases listed above is easy:
>>> class RealHash(object): ... def __init__(self,value): ... self.value = value ... def __eq__(self,o): ... return self.value == o.value ... def __hash__(self): ... return hash(self.value) ... >>> r1 = RealHash(5) >>> r2 = RealHash(5) >>> r1 == r2 True >>> dr = {r1: 'value'} >>> r2 in dr True >>> s2 = {r1} >>> r2 in s2 TrueThis is usually what you expect. I'll take a stab at this.