Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#19634 closed Bug (fixed)

Broken __hash__ methods

Reported by: Anssi Kääriäinen Owned by: Aymeric Augustin
Component: Python 3 Version: dev
Severity: Normal Keywords: sprint2013
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

We have some classes which defined __eq__ but not __hash__ before the py3 support. Python 3 doesn't allow these objects to be used in dicts, so the fix was to introduce:

__hash__ = object.__hash__

However, this is doing the wrong thing. The Python docs say that the only requirement is that objects which compare equal have the same hash value. However, if we define __eq__ so that it makes object with different id() values equal, then object.__hash__ is failing this requirement.

This isn't a regression - we have done the same thing implicitly in python 2 for ages. However, py3 introduced the requirement of defining __hash__ exactly to avoid this situation, so we should consider fixing the __hash__ functions somehow.

I am not sure what the implications of abusing object.__hash__ are.

The problematic cases seem to be:

> git grep '__hash__ = obj'
django/db/backends/__init__.py:    __hash__ = object.__hash__
django/db/models/fields/__init__.py:    __hash__ = object.__hash__
django/dispatch/saferef.py:    __hash__ = object.__hash__
django/test/html.py:    __hash__ = object.__hash__
django/utils/functional.py:        __hash__ = object.__hash__

Attachments (1)

#19634-broken_hash_methods.diff (2.3 KB ) - added by regebro 11 years ago.

Download all attachments as: .zip

Change History (10)

comment:1 by Aymeric Augustin, 11 years ago

Component: UncategorizedPython 3
Triage Stage: UnreviewedAccepted

comment:2 by regebro, 11 years ago

Keywords: sprint2013 added
Owner: changed from nobody to regebro
Status: newassigned

When "cheating" with the hash in this way values that should evaluate as equal will not do so when checking in dictionaries and sets:

>>> class FakeHash(object):
...    def __init__(self, value):
...        self.value = value
...    def __eq__(self, o):
...        return self.value == o.value
...    __hash__ = object.__hash__
... 
>>> f1 = FakeHash(5)
>>> f2 = FakeHash(5)
>>> f1 == f2
True
>>> d = {f1: 'value'}
>>> f2 in d
False
>>> s = {f1}
>>> f2 in s
False

It will work in tuples and lists though.

>>> l = [f1]
>>> f2 in l
True
>>> t = f1,
>>> f2 in t
True

Implementing a real hash in most cases listed above is easy:

>>> class RealHash(object):
...     def __init__(self,value):
...        self.value = value
...     def __eq__(self,o):
...        return self.value == o.value
...     def __hash__(self):
...        return hash(self.value)
... 
>>> r1 = RealHash(5)
>>> r2 = RealHash(5)
>>> r1 == r2
True
>>> dr = {r1: 'value'}
>>> r2 in dr
True
>>> s2 = {r1}
>>> r2 in s2
True

This is usually what you expect. I'll take a stab at this.

by regebro, 11 years ago

comment:3 by regebro, 11 years ago

A patch that fixes most of these attached.

When it comes to BaseDatabaseWrapper, it's clear that you can have several DatabaseWrappers that has the same parameters but are different connections, so you can't implement hash in a way that won't break tests. That opens the question if there really should be a eq, but that's a different question.

comment:4 by Aymeric Augustin, 11 years ago

Owner: changed from regebro to Aymeric Augustin

This looks interesting. I'll review that post-sprint.

in reply to:  3 comment:5 by Aymeric Augustin, 11 years ago

Replying to regebro:

When it comes to BaseDatabaseWrapper, it's clear that you can have several DatabaseWrappers that has the same parameters but are different connections, so you can't implement hash in a way that won't break tests. That opens the question if there really should be a eq, but that's a different question.

Its __eq__ method compares the alias attribute, which is the key in the DATABASES dictionary. Parameters aren't involved at all. What's wrong with returning hash(self.alias)?

comment:6 by Aymeric Augustin, 11 years ago

I found the answer to my question after running the tests:

======================================================================
FAIL: test_connections_thread_local (regressiontests.backends.tests.ThreadTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/myk/Documents/dev/django/tests/regressiontests/backends/tests.py", line 626, in test_connections_thread_local
    self.assertEqual(len(connections_set), 6)
AssertionError: 2 != 6

======================================================================
FAIL: test_default_connection_thread_local (regressiontests.backends.tests.ThreadTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/myk/Documents/dev/django/tests/regressiontests/backends/tests.py", line 599, in test_default_connection_thread_local
    3)
AssertionError: 2 != 3

I think these tests are poorly written and should be updated.

comment:7 by Anssi Kääriäinen, 11 years ago

Yeah, rewriting those tests would be a good idea, they do cause problems somewhat often.

comment:8 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In e76147a83a7306d6d83057b982207ddab37eb942:

Fixed #19634 -- Added proper hash methods.

Classes overriding eq need a hash such that equal objects have
the same hash.

Thanks akaariai for the report and regebro for the patch.

comment:9 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

In 702d39921cb95883d3b7590eadd65b03f76d0f92:

[1.5.x] Fixed #19634 -- Added proper hash methods.

Classes overriding eq need a hash such that equal objects have
the same hash.

Thanks akaariai for the report and regebro for the patch.

Backport of e76147a from master.

Note: See TracTickets for help on using tickets.
Back to Top