Opened 8 weeks ago

Closed 7 weeks ago

Last modified 7 weeks ago

#28641 closed New feature (needsinfo)

Improvements to QuerySet.bulk_create()

Reported by: Дилян Палаузов Owned by: nobody
Component: Database layer (models, ORM) Version: 1.11
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

https://docs.djangoproject.com/en/1.11/ref/models/querysets/#bulk-create does not state what bulk_create returns, so that the implementation can be optimizerd by making bulk_create() return None.

As QuerySet.get_or_create() ensures, that after the operation an object exists in the database, bulk_create() can do the same on Postgresql using "ON CONFLICT DO NOTHING", possibly taking a parameter. I guess currently bulk_create fails as whole, if any INSERT cannot be performed, so that a developer has to do "for x in Y: x.get_or_create()" instead of buld_create([x for x in Y]).

INSERT (for bulk_create) can also use RETURNING to get the ids of the newly created objects which can be used then to send signals. Is there any particular reason, why on Postgresql bulk_create() does not send pre_save / post_save signals?

Change History (4)

comment:1 Changed 8 weeks ago by Дилян Палаузов

Part 1: Send post_save signal in bulk_create:

diff  --git a/docs/ref/models/querysets.txt b/docs/ref/models/querysets.txt
--- a/django/db/models/query.py
+++ b/django/db/models/query.py
@@ -20,6 +20,7 @@ from django.db.models.expressions import F
 from django.db.models.fields import AutoField
 from django.db.models.functions import Trunc
 from django.db.models.query_utils import InvalidQuery, Q
+from django.db.models.signals import post_save
 from django.db.models.sql.constants import CURSOR
 from django.utils import six, timezone
 from django.utils.deprecation import RemovedInDjango20Warning
@@ -417,7 +418,7 @@ class QuerySet(object):
         #    insert into the childmost table.
         # We currently set the primary keys on the objects when using
         # PostgreSQL via the RETURNING ID clause. It should be possible for
-        # Oracle as well, but the semantics for  extracting the primary keys is
+        # Oracle as well, but the semantics for extracting the primary keys is
         # trickier so it's not done yet.
         assert batch_size is None or batch_size > 0
         # Check that the parents share the same concrete model with the our
@@ -448,6 +449,11 @@ class QuerySet(object):
                     obj_without_pk._state.adding = False
                     obj_without_pk._state.db = self.db
 
+
+        if not self.model._meta.auto_created and (not objs_without_pk or connection.features.can_return_ids_from_bulk_insert):
+            for obj in objs_with_pk + objs_without_pk:
+                post_save.send(sender=self.model.__class__, instance=obj, created=True, using=self.db)
+
         return objs
 
     def get_or_create(self, defaults=None, **kwargs):

diff --git a/docs/ref/models/querysets.txt b/docs/ref/models/querysets.txt
--- a/docs/ref/models/querysets.txt
+++ b/docs/ref/models/querysets.txt
@@ -1937,14 +1937,18 @@ are)::
 
 This has a number of caveats though:
 
-* The model's ``save()`` method will not be called, and the ``pre_save`` and
-  ``post_save`` signals will not be sent.
+* The model's ``save()`` method will not be called, and the ``pre_save``
+  signal will not be sent.  The ``post_save`` signal will be sent, if there
+  are no conflicts at database level and either all objs have set their primary
+  keys before calling bulk_create, or if the backend is PostgreSQL.
 * It does not work with child models in a multi-table inheritance scenario.
 * If the model's primary key is an :class:`~django.db.models.AutoField` it
   does not retrieve and set the primary key attribute, as ``save()`` does,
   unless the database backend supports it (currently PostgreSQL).
 * It does not work with many-to-many relationships.
 
+Returns the passed objects, possibly setting their primary keys.
+
 .. versionchanged:: 1.10
 
     Support for setting primary keys on objects created using ``bulk_create()``

comment:2 Changed 8 weeks ago by Дилян Палаузов

Component: UncategorizedDatabase layer (models, ORM)
Type: UncategorizedNew feature

comment:3 Changed 7 weeks ago by Tim Graham

Resolution: needsinfo
Status: newclosed
Summary: QuerySet.bulk_create()Improvements to QuerySet.bulk_create()

Hi, please try to keep a ticket focused on one issue. I'm going to close this ticket for now because it's too unfocused but I'll leave some ideas here about your proposals:

  1. Most likely we're not going to break backwards compatibility and change the return type of bulk_create() (there would need to be a really compelling reason to do that). Perhaps the return type should be added to the documentation.
  1. I'm not sure if it's common to use bulk_create() in situations where a race condition might be present. Most likely a discussion on the DevelopersMailingList is needed to form a consensus about your proposal regarding "ON CONFLICT DO NOTHING."
  1. There's some discussion in the comments of #19527 about a version of QuerySet.bulk_create() that sends signals. Again, this isn't something to change lightly as it has an impact on performance.

If you want to pursue these issues, I'd say the next step is to try searching Trac using a Google search with site:code.djangoproject.com and related keywords to find past discussion. You can also search the archives of the DevelopersMailingList. Then you would make a post to the DevelopersMailingList that summarizes past discussion of the issue(s) and includes your proposal about how to move forward. If the discussion yields a consensus, then we can open a more focused ticket (or tickets) with the improvement(s).

Thanks!

comment:4 Changed 7 weeks ago by Tim Graham

#28668 is created for ON CONFLICT DO NOTHING support.

Note: See TracTickets for help on using tickets.
Back to Top