Opened 3 years ago

Closed 18 months ago

#33414 closed Bug (fixed)

Diamond inheritance causes duplicated PK error when creating an object, if the primary key field has a default.

Reported by: Yu Li Owned by: Akash Kumar Sen
Component: Database layer (models, ORM) Version: 4.0
Severity: Normal Keywords:
Cc: Simon Charette, Abhijeet Viswa, Akash Kumar Sen Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi, I'm not sure if this is a bug or an unsupported feature. But I looked into the django/db/models/base.py source code and now have a pretty good understanding of what is happening.

My business code uses a diamond shape inheritance to model different types of user posts: UserPost, ImagePost, VideoPost, and MixedContentPost. The inheritance goes like this: both ImagePost and VideoPost extend from UserPost, and the MixedContentPost inherits from ImagePost and VideoPost. All of them are concrete models

I read the doc and expected it to work, similar to the example

class Piece(models.Model):
    pass

class Article(Piece):
    article_piece = models.OneToOneField(Piece, on_delete=models.CASCADE, parent_link=True)
    ...

class Book(Piece):
    book_piece = models.OneToOneField(Piece, on_delete=models.CASCADE, parent_link=True)
    ...

class BookReview(Book, Article):
    pass

However, I found out that the doc's example only works when these models use a primary key field that does not have a default. In my case, we are using a UUIDField as the primary key with a default of uuid4. Trying to create a BookReview in our case, causes a django.db.utils.IntegrityError: UNIQUE constraint failed error, because django tries to create the Piece object twice, with the same uuid.

The same behavior is found if I used a AutoField on Piece, with a custom default function, such as

id = 99

def increment():
    global id
    id += 1
    return id

This default function makes no sense in practice, but I just use it here to illustrate the root cause of the problem:

The _save_table method in db/models/base.py has a like this:

        # Skip an UPDATE when adding an instance and primary key has a default.
        if (
            not raw
            and not force_insert
            and self._state.adding
            and meta.pk.default
            and meta.pk.default is not NOT_PROVIDED
        ):
            force_insert = True

When a default is not present, which is the case of the documentation example, Django will first insert the first instance of the Piece object, and then for the second one, since force_insert is False, _save_table tries an update and goes through. Therefore there is not duplicate.

However, if a default is present, then the second Piece becomes an insertion as well (because meta.pk.default and meta.pk.default is not NOT_PROVIDED, force_insert is True). This causes a duplication on the primary key.

On the other hand, _insert_parent does an in-order traversal calling _save_table on each node, so even in the no-default pk case, it is calling a redundant update on the root node after the insertion..

So which function is at fault?

The source code _save_table assumes that if you are calling it with a default pk then you can skip an update. This assumption looks weird to me: why only when there IS a default pk you can skip update? Why not just skip update as long as we know we are inserting? (via self._state.adding) Is it just to make it special so that AutoField works? If _save_table's responsibility is to try updating before inserting, except when the params force it to do an update or insert, then it shouldn't override that behavior by this self-assumeption within it.

I think the solution is to simply move the check to save_base. And don't do this check in _save_parents.

Like this:

    def save_base(
        self,
        raw=False,
        force_insert=False,
        force_update=False,
        using=None,
        update_fields=None,
    ):
        """
        Handle the parts of saving which should be done only once per save,
        yet need to be done in raw saves, too. This includes some sanity
        checks and signal sending.

        The 'raw' argument is telling save_base not to save any parent
        models and not to do any changes to the values before save. This
        is used by fixture loading.
        """
        using = using or router.db_for_write(self.__class__, instance=self)
        assert not (force_insert and (force_update or update_fields))
        assert update_fields is None or update_fields
        cls = origin = self.__class__
        # Skip proxies, but keep the origin as the proxy model.
        if cls._meta.proxy:
            cls = cls._meta.concrete_model
        meta = cls._meta
        if not meta.auto_created:
            pre_save.send(
                sender=origin,
                instance=self,
                raw=raw,
                using=using,
                update_fields=update_fields,
            )
        # A transaction isn't needed if one query is issued.
        if meta.parents:
            context_manager = transaction.atomic(using=using, savepoint=False)
        else:
            context_manager = transaction.mark_for_rollback_on_error(using=using)
        with context_manager:
            parent_inserted = False
            if not raw:
                parent_inserted = self._save_parents(cls, using, update_fields)

            # Skip an UPDATE when adding an instance and primary key has a default.
            if (
                not raw
                and not force_insert
                and self._state.adding
                and meta.pk.default
                and meta.pk.default is not NOT_PROVIDED
            ):
                force_insert = True
            updated = self._save_table(
                raw,
                cls,
                force_insert or parent_inserted,
                force_update,
                using,
                update_fields,
            )
        # Store the database on which the object was saved
        self._state.db = using
        # Once saved, this is no longer a to-be-added instance.
        self._state.adding = False

        # Signal that the save is complete
        if not meta.auto_created:
            post_save.send(
                sender=origin,
                instance=self,
                created=(not updated),
                update_fields=update_fields,
                raw=raw,
                using=using,
            )

    save_base.alters_data = True

I have never contributed to Django before. If you think I'm right on this one I'll look into creating a PR.

Change History (11)

comment:1 by Yu Li, 3 years ago

Owner: changed from nobody to Yu Li
Status: newassigned

Is this project completely community supported? Is there a project lead I can discuss with to confirm my suspicion? Thank you.
See this patch for my changes:
https://github.com/kaleido-public/django/commit/70c00ccff35f039705eb0a748939d4c3d6dbe93a

comment:2 by Tim Graham, 3 years ago

Component: UncategorizedDatabase layer (models, ORM)

You can create a pull request and see if any tests break.

comment:3 by Mariusz Felisiak, 3 years ago

Cc: Simon Charette added
Triage Stage: UnreviewedAccepted

Thanks for the report. I was able to reproduce this issue.

This assumption looks weird to me: why only when there IS a default pk you can skip update?

This is an optimization to skip UPDATE when inherited primary key has a default (see babd4126853e48594b61e8db71a83d7bdd929b9c and #29129).

Why not just skip update as long as we know we are inserting? (via self._state.adding)

As far as I'm aware, this would be contrary to the currently documented process.

I think the solution is to simply move the check to save_base. And don't do this check in _save_parents.

This breaks optimization added in babd4126853e48594b61e8db71a83d7bdd929b9c, see basic.tests.ModelInstanceCreationTests.test_save_parent_primary_with_default.

Is this project completely community supported?

Yes.

Is there a project lead I can discuss with to confirm my suspicion? Thank you.

No, we don't have a project lead. You can join the DevelopersMailingList and share your ideas. You can also interact on the Django forum and the #django-dev IRC channel.

comment:4 by Mariusz Felisiak, 3 years ago

Cc: Abhijeet Viswa added

comment:5 by Simon Charette, 3 years ago

Thank you for your detailed report.

I haven't looked into the details of how this can be addressed but one thing is certain. When an instance of model class at the head of a diamond inheritance graph is saved the ORM knows that there isn't a reason to update involved tables/nodes more than once. In other words, creating a BookReview should only require 4 queries and not 6 and this issue predates babd4126853e48594b61e8db71a83d7bdd929b9c. I believe that's what we should focus on solving here as that will address this unfortunate collision while reducing the number of queries to a minimum.

Test coverage for model diamond inheritance is limited given it's a rarely used feature and good core support for fields with Python generated defaults is also relatively recent so I'm not surprised we missed this relatively obscure edge case in #29129.

comment:6 by Simon Charette, 3 years ago

After a quick investigation it seems like we simply don't have any tests covering diamond inheritance model saves.

The only instance of diamond inheritance I could find in the test suite was in model_meta tests but this hierarchy of models exists solely for Option testing purposes.

Here's what a tentative patch could look like for this issue, it seems to be passing the full suite on SQLite at least.

  • django/db/models/base.py

    diff --git a/django/db/models/base.py b/django/db/models/base.py
    index 37f6a3dd58..d363a1ddeb 100644
    a b def save_base(self, raw=False, force_insert=False,  
    823823
    824824    save_base.alters_data = True
    825825
    826     def _save_parents(self, cls, using, update_fields):
     826    def _save_parents(self, cls, using, update_fields, saved_parents=None):
    827827        """Save all the parents of cls using values from self."""
    828828        meta = cls._meta
    829829        inserted = False
     830        if saved_parents is None:
     831            saved_parents = {}
    830832        for parent, field in meta.parents.items():
    831833            # Make sure the link fields are synced between parent and self.
    832834            if (field and getattr(self, parent._meta.pk.attname) is None and
    833835                    getattr(self, field.attname) is not None):
    834836                setattr(self, parent._meta.pk.attname, getattr(self, field.attname))
    835             parent_inserted = self._save_parents(cls=parent, using=using, update_fields=update_fields)
    836             updated = self._save_table(
    837                 cls=parent, using=using, update_fields=update_fields,
    838                 force_insert=parent_inserted,
    839             )
    840             if not updated:
     837            if (parent_updated := saved_parents.get(parent)) is None:
     838                parent_inserted = self._save_parents(
     839                    cls=parent, using=using, update_fields=update_fields, saved_parents=saved_parents,
     840                )
     841                updated = self._save_table(
     842                    cls=parent, using=using, update_fields=update_fields,
     843                    force_insert=parent_inserted,
     844                )
     845                if not updated:
     846                    inserted = True
     847                saved_parents[parent] = updated
     848            elif not parent_updated:
    841849                inserted = True
    842850            # Set the parent's PK value to self.
    843851            if field:
  • tests/model_inheritance/models.py

    diff --git a/tests/model_inheritance/models.py b/tests/model_inheritance/models.py
    index fa37e121ea..99ce78a0f0 100644
    a b class Child(Parent):  
    175175
    176176class GrandChild(Child):
    177177    pass
     178
     179
     180# Diamond inheritance
     181class CommonAncestor(models.Model):
     182    pass
     183
     184
     185class FirstParent(CommonAncestor):
     186    first_ancestor = models.OneToOneField(CommonAncestor, models.CASCADE, primary_key=True, parent_link=True)
     187
     188
     189class SecondParent(CommonAncestor):
     190    second_ancestor = models.OneToOneField(CommonAncestor, models.CASCADE, primary_key=True, parent_link=True)
     191
     192
     193class CommonChild(FirstParent, SecondParent):
     194    pass
  • tests/model_inheritance/tests.py

    diff --git a/tests/model_inheritance/tests.py b/tests/model_inheritance/tests.py
    index 550a297fb3..97fb3e4a78 100644
    a b  
    77from django.test.utils import CaptureQueriesContext, isolate_apps
    88
    99from .models import (
    10     Base, Chef, CommonInfo, GrandChild, GrandParent, ItalianRestaurant,
     10    Base, Chef, CommonInfo, CommonChild, GrandChild, GrandParent, ItalianRestaurant,
    1111    MixinModel, Parent, ParkingLot, Place, Post, Restaurant, Student, SubBase,
    1212    Supplier, Title, Worker,
    1313)
    def b():  
    150150                    sql = query['sql']
    151151                    self.assertIn('INSERT INTO', sql, sql)
    152152
     153    def test_create_diamond_mti(self):
     154        with self.assertNumQueries(4):
     155            common_child = CommonChild.objects.create()
     156        with self.assertNumQueries(4):
     157            common_child.save()
     158
    153159    def test_eq(self):
    154160        # Equality doesn't transfer in multitable inheritance.
    155161        self.assertNotEqual(Place(id=1), Restaurant(id=1))

comment:7 by Mariusz Felisiak, 18 months ago

Owner: Yu Li removed
Status: assignednew

comment:8 by Akash Kumar Sen, 18 months ago

Cc: Akash Kumar Sen added
Has patch: set
Owner: set to Akash Kumar Sen
Status: newassigned

comment:10 by Mariusz Felisiak, 18 months ago

Triage Stage: AcceptedReady for checkin

comment:11 by Mariusz Felisiak <felisiak.mariusz@…>, 18 months ago

Resolution: fixed
Status: assignedclosed

In 5d20e02:

Fixed #33414 -- Fixed creating diamond-shaped MTI objects for common ancestor with primary key that has a default.

Co-authored-by: Simon Charette <charette.s@…>

Note: See TracTickets for help on using tickets.
Back to Top