Opened 10 months ago

Last modified 25 hours ago

#36144 new Cleanup/optimization

DatabaseOperations.bulk_batch_size() should consider more database limits on SQLite and Oracle

Reported by: Sarah Boyce Owned by:
Component: Database layer (models, ORM) Version: 5.2
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Sarah Boyce)

DatabaseOperations.bulk_batch_size() is used to calculate the maximum batch size when doing operations such as bulk_update and bulk_create

When investigating the impact of composite primary keys on the maximum batch size calculation for bulk_update(), it became clear that there are more database limits that need to be considered when calculating the maximum batch size in order to have a bullet proof solution.

One possible limit in play on SQLite is SQLITE_MAX_EXPR_DEPTH which is 1000 (see https://www.sqlite.org/limits.html#max_expr_depth).
On Oracle, we found that a query could error with the ambiguous message: ORA-00907: missing right parenthesis, which may be due to hitting some limit (possibly documented here: https://docs.oracle.com/en/database/oracle/oracle-database/23/lnpls/plsql-program-limits.html)

We may need to revisit the API design.

PR discussion: https://github.com/django/django/pull/19088#discussion_r1929940327
Ticket which sparked the discussion/discovery: #36118

Change History (3)

comment:1 by Sarah Boyce, 10 months ago

Description: modified (diff)

comment:2 by Natalia Bidart, 10 months ago

Triage Stage: UnreviewedAccepted
Version: 5.15.2

Thank you Sarah, I read the conversation and it makes sense.

comment:3 by Pravin, 25 hours ago

Hi all,
I tried to produce the failure. I have created relatively large complex model setup.

class CompositePKModel(models.Model):
    part1 = models.IntegerField()
    part2 = models.IntegerField()
    value1 = models.IntegerField(default=0)
    value2 = models.IntegerField(default=0)
    value3 = models.IntegerField(default=0)
    flag = models.BooleanField(default=False)

    class Meta:
        
        constraints = [
            models.UniqueConstraint(fields=["part1", "part2"], name="pk_cmp")
        ]

class RelatedModel(models.Model):
    parent = models.ForeignKey(CompositePKModel, on_delete=models.CASCADE)
    score1 = models.IntegerField(default=0)
    score2 = models.IntegerField(default=0)
    active = models.BooleanField(default=False)

class AnotherRelatedModel(models.Model):
    parent = models.ForeignKey(RelatedModel, on_delete=models.CASCADE)
    rating = models.IntegerField(default=0)
    status = models.BooleanField(default=True)
class BulkUpdateComplexTest(TestCase):
    databases = {"default"}

    @classmethod
    def setUpTestData(cls):
        cls.parents = [
            CompositePKModel(part1=i, part2=i + 1)
            for i in range(800)
        ]
        CompositePKModel.objects.bulk_create(cls.parents)

        cls.related_objs = []
        for parent in cls.parents:
            for j in range(3): 
                cls.related_objs.append(
                    RelatedModel(parent=parent)
                )
        RelatedModel.objects.bulk_create(cls.related_objs)

        cls.another_related_objs = []
        for rel in cls.related_objs:
            for k in range(2):
                cls.another_related_objs.append(
                    AnotherRelatedModel(parent=rel)
                )
        AnotherRelatedModel.objects.bulk_create(cls.another_related_objs)

    def test_bulk_update_multiple_models(self):
        for obj in self.parents:
            obj.value1 = 111
            obj.value2 = 222
            obj.value3 = 333
            obj.flag = True
        for obj in self.related_objs:
            obj.score1 = 10
            obj.score2 = 20
            obj.active = True
        for obj in self.another_related_objs:
            obj.rating = 5
            obj.status = False

        try:
            with transaction.atomic():
                CompositePKModel.objects.bulk_update(
                    self.parents, ['value1', 'value2', 'value3', 'flag']
                )
                RelatedModel.objects.bulk_update(
                    self.related_objs, ['score1', 'score2', 'active']
                )
                AnotherRelatedModel.objects.bulk_update(
                    self.another_related_objs, ['rating', 'status']
                )
        except OperationalError as e:
            self.fail(f"bulk_update failed due to SQLite expression depth: {e}")

        self.assertTrue(all(o.value1 == 111 for o in CompositePKModel.objects.all()))
        self.assertTrue(all(o.score1 == 10 for o in RelatedModel.objects.all()))
        self.assertTrue(all(o.rating == 5 for o in AnotherRelatedModel.objects.all()))

am i missing something ?

Note: See TracTickets for help on using tickets.
Back to Top