Code

Opened 3 years ago

Closed 8 months ago

#16458 closed Bug (fixed)

Equality operator on django.db.models.Model not commutative

Reported by: freek.wiekmeijer@… Owned by: anonymous
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: anssi.kaariainen@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by lukeplant)

Problem outline

In math is is a common assumption that the equality operator should be commutative, i.e. A==B would always correspond to B==A.
In django's Model class I found non-commutative behaviour.

Affected code

file: django.db.models.base.py

class Model(object):
...

   def __eq__(self, other):
        return isinstance(other, self.__class__) and self._get_pk_val() == other._get_pk_val()
...

Problem detail

This implementation of eq() will cause a different outcome between (A==B) and (B==A) in cases where one of the two compared objects is a derived class of the other.

assume this:

class MyModel(models.Model): pass
class MyDerivedModel(MyModel): pass
A = MyModel()
B = MyDerivedModel()

(A==B) # --> A.__eq__ returns True if (a.pk == b.pk)
(B==A) # --> B.__eq__ returns False

Further analysis

I checked how derived models are created in the database (MySQL 5.0 backend). The table for MyDerivedModel does not contain its own primary key, but a reference to the PK in MyModel instead. This means that there can never be a collision between the primary key of MyModel and the primary key of MyDerivedModel.

So it is safe to compare PKs across base and derived classes, as is currently done.

Suggested solution

Make the type comparison between self and other symmetrical.

class Model(object):
   def __eq__(self, other):
        return (isinstance(other, self.__class__) or isinstance(self, other.__class__)) and \
               (self._get_pk_val() == other._get_pk_val())

Attachments (5)

models_equality-16458.diff (619 bytes) - added by teraom 3 years ago.
16458.diff (1.4 KB) - added by akaariai 3 years ago.
16458.2.diff (5.0 KB) - added by akaariai 3 years ago.
16458_inheritance_eq.diff (6.2 KB) - added by akaariai 3 years ago.
inherited models can be equal
16458_admin_fix.diff (629 bytes) - added by akaariai 3 years ago.
admin NestedCollector fix

Download all attachments as: .zip

Change History (27)

comment:1 Changed 3 years ago by lukeplant

  • Description modified (diff)
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 3 years ago by teraom

  • Owner changed from nobody to teraom
  • Status changed from new to assigned

Code credit : freek.wiekmeijer

comment:3 Changed 3 years ago by lukeplant

  • Needs tests set
  • Patch needs improvement set

Patch has a stray 'e' at the beginning, and needs tests.

Changed 3 years ago by teraom

comment:4 Changed 3 years ago by teraom

Replaced the patch. Removed the 'e' in the beginning.

comment:5 Changed 3 years ago by lukeplant

  • Patch needs improvement unset

comment:6 Changed 3 years ago by emulbreh

related or duplicate: #11892

comment:7 Changed 3 years ago by aaugustin

#11892 is about deferred instances, this one is about inheritance. They're independent issues with similar symptoms.

comment:8 Changed 3 years ago by akaariai

  • Cc anssi.kaariainen@… added

According to my tests if a = MyModel() and b = MyDerivedModel() a==b and b==a will both be False currently. Surprising? This is because the subclass method will be called in preference of the base class method. And in the subclass method "self" is the subclass instance, hence the isinstance test will fail and __eq__ returns False. Why the subclass method call? The Python documentation says this:

For objects x and y, first x.__op__(y) is tried. If this is not implemented or returns NotImplemented, y.__rop__(x) is tried. If this is also not implemented or returns NotImplemented, a TypeError exception is raised. But see the following exception:

Exception to the previous item: if the left operand is an instance of a built-in type or a new-style class, and the right operand is an instance of a proper subclass of that type or class and overrides the base’s __rop__() method, the right operand’s __rop__() method is tried before the left operand’s __op__() method.

This is done so that a subclass can completely override binary operators. Otherwise, the left operand’s __op__() method would always accept the right operand: when an instance of a given class is expected, an instance of a subclass of that class is always acceptable.

I guess every language has their little quirks...

I think the correct behavior should be that subclasses are not equal to their parent even if they have the same pk. If for no other reason, then because this is the way current code does it. There can be arguments to both directions, but I don't believe it is a good idea to change the behavior of __eq__. However deferred classes (the issue in #11892) should be equal to the "real" class. And __eq__ should behave symmetrically (like == already does). See attached patch for a first try. Patch needs comments and tests.

[1] http://docs.python.org/reference/datamodel.html#coercion-rules

Changed 3 years ago by akaariai

comment:9 Changed 3 years ago by anonymous

Here's an experiment in python 2.4.3:

class MyBase:

def init(self, id):

self.id = id

def eq(self, other):

print "%s.eq()" % self.class.name
return (self.id == other.id)

class MyDerived(MyBase):

pass

a = MyBase(1)
b = MyDerived(1)
a==b

MyBase.eq()
True

b==a

MyDerived.eq()
True

Also tested python 2.7.2, with the same result.

Note that the Python interpreter does not coerce the prevalence of derived classes in this case. Two observations:

  1. the eq method itself is not overridden in the derived class (I guess it's not too common to override Model.eq)
  2. Left hand prevails over right hand in the == operator

I tend to agree with your generic statement that eq should only return True if the left hand and right hand operands have the same type. But in case of a Model, the django framework makes compares primary keys. The framework ensures that a derived object cannot have the same primary key as the base object (primary keys are unique on base class level, so no two objects under the same base class share the same pk value).

As far as I can see, the "base == derived" comparison needs to handle two scenarios:

(a) left and right operands are in fact two different objects.
-> obviously, the outcome should be False in this situation.
these objects are subject to the unique contraint on base class table level, hence the two will have a different primary key.
hence the outcome of the comparison will always be False, even if you allow base/derived comparison to evaluate (as in the code I proposed earlier).

(b) left and right are in fact the same object, type casted to different classes
-> in this case it might be convenient to have the eq operator return True.

A short example in python-like pseudocode

class django.contrib.auth.User # contains a set of columns + basic eveluation functions about a user
class MyClient(django.contrib.auth.User): # the same columns but a few extra methods to improve the granularity of access control

@property
def PasswordExpired(self):

return (self.last_login < datetime.utcnow() - timedelta(years=1))

client1 = MyClient.objects.get(username='username123') # needs a MyClient object because it needs access to the derived class methods
if (client1.PasswordExpired): raise PasswordExpired("Please update your password")

or:
client2 = User.objects.get(username='username123') # here you might as well use the base class because you only access that class's members
print str(client2.last_login)

The main question is: do you want the comparison (client1 == client2) to return True (because they're the actually same user). Or False because strictly they;re two different objects belonging to a different class.

I believe that either one of these two choices will result in a workable situation from django users.
But the current situation where (client1==client2) != (client2==client1) is not desirable.

comment:10 Changed 3 years ago by akaariai

Try to change the mybase object to use new-style classes, that is MyBase to MyBase(object). You will get different results. Django models are new-style classes, so that is the behavior you will get with Django models.

I am afraid client1 == client2 must return False, as that is what happens at the moment. But on the other hand it seems hard to argue that my_proxy_model_instance is not equal to my_model_instance. Maybe that case should be changed also. But I am against changing the real inheritance case, that seems likely to cause surprising and hard to find errors in existing code...

Maybe django-developers is the right place to discuss this?

comment:11 Changed 3 years ago by akaariai

  • Needs tests unset

I wrote a patch with the following semantics

  • A object loaded as deferred is equal to one created normally
  • A proxy object is equal to its concrete parent class
  • Otherwise a derived object is never equal to its base instance
  • __eq__ is symmetric

The first two are changes to current bahvior. The deferred object change should be considered a bug fix, subclassing in that case is just an implementation detail. The proxy thing is a bit more debatable, but I think proxy models should be equal to their concrete counterparts. The most convincing argument against this is the change from current behavior.

The third item is as it has always been.

There is a very small change in __eq__. Previously a==b and b==a were symmetric, but a.__eq__(b) could return different result than b.__eq__(a) in inheritance situations. See discussion above, especially the part about Python operator coercing rules. I don't believe fixing this can be considered backwards incompatible.

Documentation and tests included. The main coding thing to consider is should ._meta.proxy_for_model be renamed (every concrete and abstract class proxies itself at the moment). The documentation portion might need a rewrite, too. The new equality operator is about 15% slower than before, tested with 10000 comparisons of obj == obj. I assume __eq__ is not a hotspot.

Changed 3 years ago by akaariai

comment:12 Changed 3 years ago by mtredinnick

  • Patch needs improvement set

Russell and I talked about this and we think derived classes can be equal to the parent if the pk's are the same. Model inheritance in Django is *not* the same as class inheritance in Python, because you can never have an instance of just the super class in Python (amongst many other differences), so there's no completely parallel case to compare with. It's useful for the derived class to be comparable to the parent instance because (by design) queries only return the parent instance and there are cases for wanting to check if we already have something when we have the derived class instance.

Deferred and proxy objects should be equal to their non-deferred, non-proxy things. Basically, we are comparing things as a bag of data (without respect to data that might be missing due to being in a subclass instance if the parent was extended to the subclass). This is consistent with a reasonable "equal" and utilises the fact that equality is not the same as object identity.

comment:13 Changed 3 years ago by akaariai

Hmm, shouldn't models then be equal if they have a common parent class. Consider situation:

class Restaurant(models.Model):
    pass

class ItalianRestaurant(Restaurant):
   pass

class EnglishRestaurant(Restaurant):
   pass

r = Restaurant(pk=1)
i = ItalianRestaurant(restaurant=r)
e = EnglishRestaurant(restaurant=r)

Then, shouldn't i==e be True, even though neither is a subclass of the other.

I admit this is a bit of a stretch, but that would be the most consistent behaviour IMHO. Making this fast and correct for large inheritance trees seems to be a bit of a challenge. Deciding that this is not needed wouldn't be a bad decision.

comment:14 Changed 3 years ago by akaariai

  • Needs documentation set

The "bag-equality" is implemented in a new patch. The approach is a set intersection done in __eq__. The __eq__ will check if the instances compared share any multi-table inherited parent (in this case a model is its own parent), and if so, then check the equality of the PK.

The performance is 50% slower in for-loop testing obj==obj repeatedly. I don't believe the speed is much worse for large inheritance chains, at least not for any practical ones.

Proxy and deferred models should be handled OK in the patch.

There is one test that needs fixing. Admin's NestedCollector (collects objects to delete) assumes an object is never equal to its parent. This is fixed, but in a different patch. I am afraid the new equality behavior will cause similar errors for users when they are upgrading Django.

After the fix, all tests should pass.

The patch needs documentation and some polishing, especially the attribute name ._meta.concrete_models is not good enough...

Changed 3 years ago by akaariai

inherited models can be equal

Changed 3 years ago by akaariai

admin NestedCollector fix

comment:15 Changed 3 years ago by akaariai

I just realized that inherited model equality based on equal primary keys does not work.

class A(models.Model):
    id = models.AutoField(primary_key=True)

class B(A):
    override_pk = models.IntegerField(primary_key=True)

In the above setting testing equality based on primary key equality is wrong. This could be solved if instances are compared based on common parent primary key field.

So, there are a couple of points why multi-table inherited models should not be equal:

  • backwards compatibility
  • performance
  • achieving correctness for all cases is hard (there is also multi-table multi-inheritance...). I believe a performance hit of at least 200-300% compared to current situation is likely if all corner cases are going to be solved.

So, how about this: proxy models are equal to their concrete parent based on primary key equivalence. So are deferred models. Multi-table inherited models are not equal to their parents. The solution would be to define proxy_for_model to be self for concrete models and use that in __eq__. Or if preferred, use another Options variable for "my concrete class" information.

comment:16 Changed 3 years ago by anonymous

  • Owner changed from teraom to anonymous
  • Status changed from assigned to new

comment:17 Changed 2 years ago by akaariai

I really do think the correct fix is making proxy models equal to their concrete parent if PKs match, but not extending the equality to multitable inheritance. Multitable inheritance equality could be backwards incompatible (see the admin NestedCollector fix for an example), the equality is hard to implement for models which have multiple primary keys (see comment:15, and an example in the end of this comment), and it will make eq somewhat slower.

In my opinion the best way forward is a new attribute, concrete_model which is always the first concrete parent for proxy models, or self for concrete models. This attribute would have use in other parts of the code, too, simplifying some cases using proxy_for_model currently. In addition proxy_for_model should be fixed in a way where it actually points to the parent model. Now it points to the first concrete parent. Many parts of Django assumes that it points to the parent model (proxy or not) and thus can produce a "proxy chain".

I am willing to write another patch which implements this.

Last, a stupid little example of what kind of problems multitable multi-inheritance can cause for __eq__:

class A:
    id_a = IntegerField(primary_key=True)

class B:
    id_b = IntegerField(primary_key=True)

class C(A, B):
    pass

class D(A, B):
    pass

Now, you can have c = C(id_a=1, id_b=1) and d = D(id_a=1, id_b=2). Are c and d equal or not? They both have same id_a which is a primary key, so they are. They have different id_b fields, so they aren't. So, you can't have bullet-proof __eq__ for multitable multi-inheritance cases. (Yes, I admit this example is a stretch).

comment:18 Changed 20 months ago by DrMeers

I need to find some time to read through the material in this ticket, but in the absence of that I'll just make a note for now: this also causes issues in django.contrib.admin.util.NestedObjects._nested, where full hierarchy traversal is prevented because attempting to fetch a node from the edge graph fails because the collected object is not considered equal to its proxy form being deleted via the admin. It therefore reports an incomplete set of related objects on the "are you sure?" deletion page (but I think still deletes the missing ones). Should probably become a ticket in its own right...

comment:19 Changed 8 months ago by akaariai

Some discussion about this issue at https://groups.google.com/forum/#!topic/django-developers/EpUySf_p_Kw. The branch at https://github.com/akaariai/django/tree/model_eq_proxy contains somewhat polished patch implementing proxy equality.

I see there is a comment from ORM gurus for implementing the more broad full inheritance equality in comment:12. There is another patch (though a little less polished one) at https://github.com/akaariai/django/tree/model_eq_inherit. I wonder if that is still wanted? To me it seems there are cases where inheritance equality is wanted and cases where non-inheritance equality is wanted. The strongest argument I can make for non-inheritance equality is backwards compatibility. For one example see the changes to admin_views tests in model_eq_inherit branch.

comment:20 Changed 8 months ago by akaariai

  • Needs documentation unset
  • Patch needs improvement unset

comment:21 Changed 8 months ago by timo

  • Triage Stage changed from Accepted to Ready for checkin

comment:22 Changed 8 months ago by akaariai

  • Resolution set to fixed
  • Status changed from new to closed

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.