Opened 3 years ago
Closed 17 months ago
#33414 closed Bug (fixed)
Diamond inheritance causes duplicated PK error when creating an object, if the primary key field has a default.
Reported by: | Yu Li | Owned by: | Akash Kumar Sen |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.0 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, Abhijeet Viswa, Akash Kumar Sen | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi, I'm not sure if this is a bug or an unsupported feature. But I looked into the django/db/models/base.py source code and now have a pretty good understanding of what is happening.
My business code uses a diamond shape inheritance to model different types of user posts: UserPost, ImagePost, VideoPost, and MixedContentPost. The inheritance goes like this: both ImagePost and VideoPost extend from UserPost, and the MixedContentPost inherits from ImagePost and VideoPost. All of them are concrete models
I read the doc and expected it to work, similar to the example
class Piece(models.Model): pass class Article(Piece): article_piece = models.OneToOneField(Piece, on_delete=models.CASCADE, parent_link=True) ... class Book(Piece): book_piece = models.OneToOneField(Piece, on_delete=models.CASCADE, parent_link=True) ... class BookReview(Book, Article): pass
However, I found out that the doc's example only works when these models use a primary key field that does not have a default. In my case, we are using a UUIDField as the primary key with a default of uuid4. Trying to create a BookReview in our case, causes a django.db.utils.IntegrityError: UNIQUE constraint failed error, because django tries to create the Piece object twice, with the same uuid.
The same behavior is found if I used a AutoField on Piece, with a custom default function, such as
id = 99 def increment(): global id id += 1 return id
This default function makes no sense in practice, but I just use it here to illustrate the root cause of the problem:
The _save_table method in db/models/base.py has a like this:
# Skip an UPDATE when adding an instance and primary key has a default. if ( not raw and not force_insert and self._state.adding and meta.pk.default and meta.pk.default is not NOT_PROVIDED ): force_insert = True
When a default is not present, which is the case of the documentation example, Django will first insert the first instance of the Piece object, and then for the second one, since force_insert is False, _save_table tries an update and goes through. Therefore there is not duplicate.
However, if a default is present, then the second Piece becomes an insertion as well (because meta.pk.default and meta.pk.default is not NOT_PROVIDED, force_insert is True). This causes a duplication on the primary key.
On the other hand, _insert_parent does an in-order traversal calling _save_table on each node, so even in the no-default pk case, it is calling a redundant update on the root node after the insertion..
So which function is at fault?
The source code _save_table assumes that if you are calling it with a default pk then you can skip an update. This assumption looks weird to me: why only when there IS a default pk you can skip update? Why not just skip update as long as we know we are inserting? (via self._state.adding) Is it just to make it special so that AutoField works? If _save_table's responsibility is to try updating before inserting, except when the params force it to do an update or insert, then it shouldn't override that behavior by this self-assumeption within it.
I think the solution is to simply move the check to save_base. And don't do this check in _save_parents.
Like this:
def save_base( self, raw=False, force_insert=False, force_update=False, using=None, update_fields=None, ): """ Handle the parts of saving which should be done only once per save, yet need to be done in raw saves, too. This includes some sanity checks and signal sending. The 'raw' argument is telling save_base not to save any parent models and not to do any changes to the values before save. This is used by fixture loading. """ using = using or router.db_for_write(self.__class__, instance=self) assert not (force_insert and (force_update or update_fields)) assert update_fields is None or update_fields cls = origin = self.__class__ # Skip proxies, but keep the origin as the proxy model. if cls._meta.proxy: cls = cls._meta.concrete_model meta = cls._meta if not meta.auto_created: pre_save.send( sender=origin, instance=self, raw=raw, using=using, update_fields=update_fields, ) # A transaction isn't needed if one query is issued. if meta.parents: context_manager = transaction.atomic(using=using, savepoint=False) else: context_manager = transaction.mark_for_rollback_on_error(using=using) with context_manager: parent_inserted = False if not raw: parent_inserted = self._save_parents(cls, using, update_fields) # Skip an UPDATE when adding an instance and primary key has a default. if ( not raw and not force_insert and self._state.adding and meta.pk.default and meta.pk.default is not NOT_PROVIDED ): force_insert = True updated = self._save_table( raw, cls, force_insert or parent_inserted, force_update, using, update_fields, ) # Store the database on which the object was saved self._state.db = using # Once saved, this is no longer a to-be-added instance. self._state.adding = False # Signal that the save is complete if not meta.auto_created: post_save.send( sender=origin, instance=self, created=(not updated), update_fields=update_fields, raw=raw, using=using, ) save_base.alters_data = True
I have never contributed to Django before. If you think I'm right on this one I'll look into creating a PR.
Change History (11)
comment:1 by , 3 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 3 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|
You can create a pull request and see if any tests break.
comment:3 by , 3 years ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
Thanks for the report. I was able to reproduce this issue.
This assumption looks weird to me: why only when there IS a default pk you can skip update?
This is an optimization to skip UPDATE
when inherited primary key has a default (see babd4126853e48594b61e8db71a83d7bdd929b9c and #29129).
Why not just skip update as long as we know we are inserting? (via self._state.adding)
As far as I'm aware, this would be contrary to the currently documented process.
I think the solution is to simply move the check to save_base. And don't do this check in _save_parents.
This breaks optimization added in babd4126853e48594b61e8db71a83d7bdd929b9c, see basic.tests.ModelInstanceCreationTests.test_save_parent_primary_with_default
.
Is this project completely community supported?
Yes.
Is there a project lead I can discuss with to confirm my suspicion? Thank you.
No, we don't have a project lead. You can join the DevelopersMailingList and share your ideas. You can also interact on the Django forum and the #django-dev IRC channel.
comment:4 by , 3 years ago
Cc: | added |
---|
comment:5 by , 3 years ago
Thank you for your detailed report.
I haven't looked into the details of how this can be addressed but one thing is certain. When an instance of model class at the head of a diamond inheritance graph is saved the ORM knows that there isn't a reason to update involved tables/nodes more than once. In other words, creating a BookReview
should only require 4 queries and not 6 and this issue predates babd4126853e48594b61e8db71a83d7bdd929b9c. I believe that's what we should focus on solving here as that will address this unfortunate collision while reducing the number of queries to a minimum.
Test coverage for model diamond inheritance is limited given it's a rarely used feature and good core support for fields with Python generated defaults is also relatively recent so I'm not surprised we missed this relatively obscure edge case in #29129.
comment:6 by , 3 years ago
After a quick investigation it seems like we simply don't have any tests covering diamond inheritance model saves.
The only instance of diamond inheritance I could find in the test suite was in model_meta tests but this hierarchy of models exists solely for Option
testing purposes.
Here's what a tentative patch could look like for this issue, it seems to be passing the full suite on SQLite at least.
-
django/db/models/base.py
diff --git a/django/db/models/base.py b/django/db/models/base.py index 37f6a3dd58..d363a1ddeb 100644
a b def save_base(self, raw=False, force_insert=False, 823 823 824 824 save_base.alters_data = True 825 825 826 def _save_parents(self, cls, using, update_fields ):826 def _save_parents(self, cls, using, update_fields, saved_parents=None): 827 827 """Save all the parents of cls using values from self.""" 828 828 meta = cls._meta 829 829 inserted = False 830 if saved_parents is None: 831 saved_parents = {} 830 832 for parent, field in meta.parents.items(): 831 833 # Make sure the link fields are synced between parent and self. 832 834 if (field and getattr(self, parent._meta.pk.attname) is None and 833 835 getattr(self, field.attname) is not None): 834 836 setattr(self, parent._meta.pk.attname, getattr(self, field.attname)) 835 parent_inserted = self._save_parents(cls=parent, using=using, update_fields=update_fields) 836 updated = self._save_table( 837 cls=parent, using=using, update_fields=update_fields, 838 force_insert=parent_inserted, 839 ) 840 if not updated: 837 if (parent_updated := saved_parents.get(parent)) is None: 838 parent_inserted = self._save_parents( 839 cls=parent, using=using, update_fields=update_fields, saved_parents=saved_parents, 840 ) 841 updated = self._save_table( 842 cls=parent, using=using, update_fields=update_fields, 843 force_insert=parent_inserted, 844 ) 845 if not updated: 846 inserted = True 847 saved_parents[parent] = updated 848 elif not parent_updated: 841 849 inserted = True 842 850 # Set the parent's PK value to self. 843 851 if field: -
tests/model_inheritance/models.py
diff --git a/tests/model_inheritance/models.py b/tests/model_inheritance/models.py index fa37e121ea..99ce78a0f0 100644
a b class Child(Parent): 175 175 176 176 class GrandChild(Child): 177 177 pass 178 179 180 # Diamond inheritance 181 class CommonAncestor(models.Model): 182 pass 183 184 185 class FirstParent(CommonAncestor): 186 first_ancestor = models.OneToOneField(CommonAncestor, models.CASCADE, primary_key=True, parent_link=True) 187 188 189 class SecondParent(CommonAncestor): 190 second_ancestor = models.OneToOneField(CommonAncestor, models.CASCADE, primary_key=True, parent_link=True) 191 192 193 class CommonChild(FirstParent, SecondParent): 194 pass -
tests/model_inheritance/tests.py
diff --git a/tests/model_inheritance/tests.py b/tests/model_inheritance/tests.py index 550a297fb3..97fb3e4a78 100644
a b 7 7 from django.test.utils import CaptureQueriesContext, isolate_apps 8 8 9 9 from .models import ( 10 Base, Chef, CommonInfo, GrandChild, GrandParent, ItalianRestaurant,10 Base, Chef, CommonInfo, CommonChild, GrandChild, GrandParent, ItalianRestaurant, 11 11 MixinModel, Parent, ParkingLot, Place, Post, Restaurant, Student, SubBase, 12 12 Supplier, Title, Worker, 13 13 ) … … def b(): 150 150 sql = query['sql'] 151 151 self.assertIn('INSERT INTO', sql, sql) 152 152 153 def test_create_diamond_mti(self): 154 with self.assertNumQueries(4): 155 common_child = CommonChild.objects.create() 156 with self.assertNumQueries(4): 157 common_child.save() 158 153 159 def test_eq(self): 154 160 # Equality doesn't transfer in multitable inheritance. 155 161 self.assertNotEqual(Place(id=1), Restaurant(id=1))
comment:7 by , 18 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:8 by , 18 months ago
Cc: | added |
---|---|
Has patch: | set |
Owner: | set to |
Status: | new → assigned |
comment:10 by , 17 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
Is this project completely community supported? Is there a project lead I can discuss with to confirm my suspicion? Thank you.
See this patch for my changes:
https://github.com/kaleido-public/django/commit/70c00ccff35f039705eb0a748939d4c3d6dbe93a