﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
33672	bulk_update - duplicate updates as side effect from batch_size	jerch	nobody	"While trying to mimick `bulk_update` behavior for `fast_update` I stumbled over duplicate updates from different batch_size. To illustrate the issue:

{{{
#!div style=""font-size: 80%""
Code highlighting:
  {{{#!python
  >>> from exampleapp.models import Foo
  >>> from django.db import connection
  >>> Foo.objects.all().values('pk')
  <QuerySet [{'pk': 1}, {'pk': 2}, {'pk': 3}, {'pk': 4}]>
  >>> pks=[1,2,3,3,2,2,1,1,4]
  >>> objs=[Foo(pk=pk) for pk in pks]
  >>> Foo.objects.bulk_update(objs, ['f1'])  # within batch_size: all duplicates filtered
  >>> connection.queries[-1]['sql']
  'UPDATE ""exampleapp_foo"" SET ""f1"" = <CASE chain shortened...> WHERE ""exampleapp_foo"".""id"" IN (1, 2, 3, 4)'
  >>> Foo.objects.bulk_update(objs, ['f1'], batch_size=4)  # multiple batches
  >>> connection.queries[-1]['sql']
  'UPDATE ""exampleapp_foo"" SET ""f1"" = <CASE chain shortened...> WHERE ""exampleapp_foo"".""id"" IN (4)'
  >>> connection.queries[-2]['sql']
  'UPDATE ""exampleapp_foo"" SET ""f1"" = <CASE chain shortened...> WHERE ""exampleapp_foo"".""id"" IN (2, 1)'
  >>> connection.queries[-3]['sql']
  'UPDATE ""exampleapp_foo"" SET ""f1"" = <CASE chain shortened...> WHERE ""exampleapp_foo"".""id"" IN (1, 2, 3)'
  }}}
}}}

The example above requests a bulk update for a degenerated pk list with duplicates `[1,2,3,3,2,2,1,1,4]`. `batch_size` now has a weird effect on which updates ""get through"":
- all within one batch - update  done `[[1,2,3,4]]`
- scattered across batches with `batch_size=4` - updates done `[[1,2,3], [2,1], [4]]`

I understand the need to drop duplicates at a single UPDATE invocation (need that for `fast_update` as well), but isn't the current behavior surfacing a side effect of an implementation detail here? Imho that behavior is hard to anticipate/control by the user.

Idea/suggestion:
How about treating this more unique/atomic at the `bulk_update` call level by either doing:
- filter duplicates for the whole changeset (automatic filtering still might be surprising for users)
- or disallow/raise for any duplicate in a changeset (would prefer that for code simplicity and more strict behavior)

With such a change `bulk_update` would always exhibit the same behavior for a given changeset, regardless of `batch_size`.

(Categorized this as cleanup/optimization, since the unlucky behavior is documented.)"	Cleanup/optimization	closed	Database layer (models, ORM)	4.0	Normal	duplicate			Unreviewed	0	0	0	0	0	0
