Opened 7 weeks ago

Last modified 6 weeks ago

#28668 assigned New feature

Add ON CONFLICT support to QuerySet.bulk_create()

Reported by: Tom Forbes Owned by: Tom Forbes
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: Дилян Палаузов Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

When using bulk_create it would be nice to support ON CONFLICT DO NOTHING, which allows existing rows to be included in the bulk_create call, e.g:

Comment.objects.create(name='test', text='test')
Comment.objects.bulk_create(Comment(name='test', text='test'), on_conflict=IGNORE)  # Does not throw an exception

All the databases we support have syntax for this, as well as updating/replacing fields.

This is a spin-off of #28641, just including the ON CONFLICT idea which is backwards-compatible and would be easier to implement than the others.

Change History (11)

comment:1 Changed 7 weeks ago by Tom Forbes

Description: modified (diff)

comment:2 Changed 7 weeks ago by Tim Graham

Description: modified (diff)
Summary: Add support for ON CONFLICT to bulk_createAdd ON CONFLICT support to QuerySet.bulk_create()
Triage Stage: UnreviewedAccepted

comment:3 Changed 7 weeks ago by Tom Forbes

Owner: changed from nobody to Tom Forbes
Status: newassigned

If in the future we want to add support for ON CONFLICT UPDATE we should design the API with that in mind.

Perhaps have DO_NOTHING as a sentinel object, so in the future we can pass in something else to describe updating (like a dictionary or somesuch?).

comment:4 Changed 7 weeks ago by Tom Forbes

Has patch: set

comment:5 Changed 7 weeks ago by Дилян Палаузов

Add documentation what bulk_create() is supposed to return.

I assume this not going to work, as Postgresql has can_return_ids_from_bulk_insert, but 'INSERT ON CONFLICT DO NOTHING RETURNING id' returns only the ids of the inserted rows, so the assert will fail. In particular not inserted rows are not returned.

-                ids = self._batched_insert(objs_without_pk, fields, batch_size)
+                ids = self._batched_insert(objs_without_pk, fields, batch_size, on_conflict=on_conflict)
	         if connection.features.can_return_ids_from_bulk_insert:
		     assert len(ids) == len(objs_without_pk)

comment:6 Changed 7 weeks ago by Tom Forbes

Good point, I'll fix it up tomorrow. There are some other issues I've found tonight as well.

comment:7 Changed 7 weeks ago by Дилян Палаузов

Cc: Дилян Палаузов added

comment:8 Changed 6 weeks ago by Tom Forbes

I fixed that assertion by skipping returning ID's from the insert if on_conflict is used. They are incompatible, as you cannot tell which models where inserted and which where skipped.

comment:9 Changed 6 weeks ago by Дилян Палаузов

For a table like CREATE TABLE t (id SERIAL PRIMARY KEY, name VARCHAR(10) UNIQUE, comment VARCHAR(10)); it is possible to tell which models were inserted with a query like

WITH
  to_be_inserted AS (SELECT * FROM (VALUES ('name12', 'comment12'), ('name5', 'comment5'), ('name6', 'comment6')) as g(name, comment)),
  successfully_inserted AS (
      INSERT INTO t ("name", "comment" ) SELECT *
        FROM to_be_inserted ON CONFLICT DO NOTHING RETURNING *)
SELECT s.id FROM to_be_inserted AS b
  LEFT JOIN successfully_inserted AS s ON (b.name = s.name AND b.comment = s.comment);

where to_be_inserted contains the values that are going into the database. The returned column contains NULL for values that were presented in the database, and the id for the inserted rows.

With the proposed changes

-                if connection.features.can_return_ids_from_bulk_insert:
+                if connection.features.can_return_ids_from_bulk_insert and not on_conflict == 'ignore':
                      assert len(ids) == len(objs_without_pk)
                  for obj_without_pk, pk in zip(objs_without_pk, ids):
                      obj_without_pk.pk = pk

the for-loop will not work, as the amount of ids is not the same as the amount of obj_without_pk.

comment:10 Changed 6 weeks ago by Chris Beck

Is there any consideration of this being included in a future 1.11.x release? I'd really like to be able to use bulk_create for a fire-hose of data that we're getting from a 3rd party who don't guarantee uniqueness so at the moment our only solution is to remove the unique_together attribute and enforce it at the reporting level.

comment:11 Changed 6 weeks ago by Tim Graham

No, new features aren't backported to stable branches. See our supported versions policy.

Note: See TracTickets for help on using tickets.
Back to Top