Context Navigation

← Previous Ticket
Next Ticket →

#33672 closed Cleanup/optimization

bulk_update - duplicate updates as side effect from batch_size — at Initial Version

Reported by:	jerch	Owned by:	nobody
Component:	Database layer (models, ORM)	Version:	4.0
Severity:	Normal	Keywords:
Cc:		Triage Stage:	Unreviewed
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

While trying to mimick bulk_update behavior for fast_update I stumbled over duplicate updates from different batch_size. To illustrate the issue:

Code highlighting:

>>> from exampleapp.models import Foo
>>> from django.db import connection
>>> Foo.objects.all().values('pk')
<QuerySet [{'pk': 1}, {'pk': 2}, {'pk': 3}, {'pk': 4}]>
>>> pks=[1,2,3,3,2,2,1,1,4]
>>> objs=[Foo(pk=pk) for pk in pks]
>>> Foo.objects.bulk_update(objs, ['f1'])  # within batch_size: all duplicates filtered
>>> connection.queries[-1]['sql']
'UPDATE "exampleapp_foo" SET "f1" = <CASE chain shortened...> WHERE "exampleapp_foo"."id" IN (1, 2, 3, 4)'
>>> Foo.objects.bulk_update(objs, ['f1'], batch_size=4)  # multiple batches
>>> connection.queries[-1]['sql']
'UPDATE "exampleapp_foo" SET "f1" = <CASE chain shortened...> WHERE "exampleapp_foo"."id" IN (4)'
>>> connection.queries[-2]['sql']
'UPDATE "exampleapp_foo" SET "f1" = <CASE chain shortened...> WHERE "exampleapp_foo"."id" IN (2, 1)'
>>> connection.queries[-3]['sql']
'UPDATE "exampleapp_foo" SET "f1" = <CASE chain shortened...> WHERE "exampleapp_foo"."id" IN (1, 2, 3)'

The example above requests a bulk update for a degenerated pk list with duplicates [1,2,3,3,2,2,1,1,4]. batch_size now has a weird effect on which updates "get through":

all within one batch - update done [[1,2,3,4]]
scattered across batches with batch_size=4 - updates done [[1,2,3], [2,1], [4]]

I understand the need to drop duplicates at a single UPDATE invocation (need that for fast_update as well), but isn't the current behavior surfacing a side effect of an implementation detail here? Imho that behavior is hard to anticipate/control by the user.

Idea/suggestion:
How about treating this more unique/atomic at the bulk_update call level by either doing:

filter duplicates for the whole changeset (automatic filtering might still be surprising for users)
or disallow/raise for any duplicate in a changeset (would prefer that for code simplicity and more clear behavior)

With such a change bulk_update would always exhibit the same behavior for a given changeset, regardless of batch_size.

(Categorized this as cleanup/optimization, since the unlucky behavior is documented.)

Note: See TracTickets for help on using tickets.

Issues

Context Navigation

#33672 closed Cleanup/optimization

bulk_update - duplicate updates as side effect from batch_size — at Initial Version

Description

Change History (0)

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us