Opened 10 months ago
Last modified 25 hours ago
#36144 new Cleanup/optimization
DatabaseOperations.bulk_batch_size() should consider more database limits on SQLite and Oracle
| Reported by: | Sarah Boyce | Owned by: | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 5.2 |
| Severity: | Normal | Keywords: | |
| Cc: | Simon Charette | Triage Stage: | Accepted |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
DatabaseOperations.bulk_batch_size() is used to calculate the maximum batch size when doing operations such as bulk_update and bulk_create
When investigating the impact of composite primary keys on the maximum batch size calculation for bulk_update(), it became clear that there are more database limits that need to be considered when calculating the maximum batch size in order to have a bullet proof solution.
One possible limit in play on SQLite is SQLITE_MAX_EXPR_DEPTH which is 1000 (see https://www.sqlite.org/limits.html#max_expr_depth).
On Oracle, we found that a query could error with the ambiguous message: ORA-00907: missing right parenthesis, which may be due to hitting some limit (possibly documented here: https://docs.oracle.com/en/database/oracle/oracle-database/23/lnpls/plsql-program-limits.html)
We may need to revisit the API design.
PR discussion: https://github.com/django/django/pull/19088#discussion_r1929940327
Ticket which sparked the discussion/discovery: #36118
Change History (3)
comment:1 by , 10 months ago
| Description: | modified (diff) |
|---|
comment:2 by , 10 months ago
| Triage Stage: | Unreviewed → Accepted |
|---|---|
| Version: | 5.1 → 5.2 |
comment:3 by , 25 hours ago
Hi all,
I tried to produce the failure. I have created relatively large complex model setup.
class CompositePKModel(models.Model):
part1 = models.IntegerField()
part2 = models.IntegerField()
value1 = models.IntegerField(default=0)
value2 = models.IntegerField(default=0)
value3 = models.IntegerField(default=0)
flag = models.BooleanField(default=False)
class Meta:
constraints = [
models.UniqueConstraint(fields=["part1", "part2"], name="pk_cmp")
]
class RelatedModel(models.Model):
parent = models.ForeignKey(CompositePKModel, on_delete=models.CASCADE)
score1 = models.IntegerField(default=0)
score2 = models.IntegerField(default=0)
active = models.BooleanField(default=False)
class AnotherRelatedModel(models.Model):
parent = models.ForeignKey(RelatedModel, on_delete=models.CASCADE)
rating = models.IntegerField(default=0)
status = models.BooleanField(default=True)
class BulkUpdateComplexTest(TestCase):
databases = {"default"}
@classmethod
def setUpTestData(cls):
cls.parents = [
CompositePKModel(part1=i, part2=i + 1)
for i in range(800)
]
CompositePKModel.objects.bulk_create(cls.parents)
cls.related_objs = []
for parent in cls.parents:
for j in range(3):
cls.related_objs.append(
RelatedModel(parent=parent)
)
RelatedModel.objects.bulk_create(cls.related_objs)
cls.another_related_objs = []
for rel in cls.related_objs:
for k in range(2):
cls.another_related_objs.append(
AnotherRelatedModel(parent=rel)
)
AnotherRelatedModel.objects.bulk_create(cls.another_related_objs)
def test_bulk_update_multiple_models(self):
for obj in self.parents:
obj.value1 = 111
obj.value2 = 222
obj.value3 = 333
obj.flag = True
for obj in self.related_objs:
obj.score1 = 10
obj.score2 = 20
obj.active = True
for obj in self.another_related_objs:
obj.rating = 5
obj.status = False
try:
with transaction.atomic():
CompositePKModel.objects.bulk_update(
self.parents, ['value1', 'value2', 'value3', 'flag']
)
RelatedModel.objects.bulk_update(
self.related_objs, ['score1', 'score2', 'active']
)
AnotherRelatedModel.objects.bulk_update(
self.another_related_objs, ['rating', 'status']
)
except OperationalError as e:
self.fail(f"bulk_update failed due to SQLite expression depth: {e}")
self.assertTrue(all(o.value1 == 111 for o in CompositePKModel.objects.all()))
self.assertTrue(all(o.score1 == 10 for o in RelatedModel.objects.all()))
self.assertTrue(all(o.rating == 5 for o in AnotherRelatedModel.objects.all()))
am i missing something ?
Thank you Sarah, I read the conversation and it makes sense.