Opened 9 years ago

Closed 9 years ago

#24997 closed New feature (fixed)

Allow bulk_create with proxy inheritance

Reported by: William Schwartz Owned by: William Schwartz
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: proxy, inheritance, bulk_create, queryset, insert
Cc: ravi.s@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Overview

The documentation for the Queryset method bulk_create states:

It does not work with child models in a multi-table inheritance scenario.

bulk_create also does not work with proxy inheritance, which does not involve multiple tables. Rather than changing the documentation to reflect that neither type of inheritance is supported, it would be better to allow proxy inheritance to work.

Origin of the problem

It makes sense not to allow bulk_create with multiple tables. The comment in bulk_create explains it best:

So this case is fun. When you bulk insert you don't get the primary keys back (if it's an autoincrement), so you can't insert into the child tables which references this. There are two workarounds, 1) this could be implemented if you didn't have an autoincrement pk, and 2) you could do it by doing O(n) normal inserts into the parent tables to get the primary keys back, and then doing a single bulk insert into the childmost table. Some databases might allow doing this by using RETURNING clause for the insert query. We're punting on these for now because they are relatively rare cases.

bulk_create prevents inherited models from using it with the following naive test:

if self.model._meta.parents:
    raise ValueError("Can't bulk create an inherited model")

This test does not discriminate between multi-table inheritance and proxy inheritance. However, simply modifying the code to read

if self.model._meta.parents and not self.model._meta.proxy:
    raise ValueError("Can't bulk create an inherited model with multiple tables")

does not fix the problem because it results in more exceptions.

Reproducing this error

I did the following using Django 1.8.2 and Python 3.4.3 on Mac OS 10.9.5.

django-admin startproject bulk_create_proxy
cd bulk_create_proxy
./manage.py startapp app

Add 'app' to the INSTALLED_APPS list in the resultant settings.py. Save the following code in app/models.py.

from django.db import models

class Parent(models.Model):
    name = models.CharField(max_length=10)

class Child(Parent):
    class Meta(object):
        proxy = True

Save the following code in app/tests.py.

from django.test import TestCase

from app.models import Parent, Child


class TestBulkCreate(TestCase):

    def assert_can_bulk_create(self, model):
        model.objects.all().delete()
        model.objects.bulk_create([
            model(name='a'), model(name='b'), model(name='c')])
        self.assertEqual(model.objects.count(), 3)

    def test_bulk_create_parent(self):
        self.assert_can_bulk_create(Parent)

    def test_bulk_create_child(self):
        self.assert_can_bulk_create(Child)

Finish up with the following commands.

./manage.py makemigrations app
./manage.py migrate
./manage.py test

You should get the following test output. (I have elided file paths; ones starting with ...django come from Django.)

bulk_create_proxy ➜ ./manage.py test
Creating test database for alias 'default'...
E.
======================================================================
ERROR: test_bulk_create_child (app.tests.TestBulkCreate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "...bulk_create_proxy/app/tests.py", line 18, in test_bulk_create_child
    self.assert_can_bulk_create(Child)
  File "...bulk_create_proxy/app/tests.py", line 11, in assert_can_bulk_create
    model(name='a'), model(name='b'), model(name='c')])
  File "...django/db/models/manager.py", line 127, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "...django/db/models/query.py", line 374, in bulk_create
    raise ValueError("Can't bulk create an inherited model")
ValueError: Can't bulk create an inherited model

----------------------------------------------------------------------
Ran 2 tests in 0.003s

FAILED (errors=1)

Change History (12)

comment:1 by Tim Graham, 9 years ago

Triage Stage: UnreviewedAccepted
Type: BugNew feature

If it's feasible to implement, I don't see why we couldn't do it.

comment:2 by William Schwartz, 9 years ago

I'll try to work this up into a real patch, but what I found was that the following changes relative to the code at git tag 1.8.2 make the test cases described in this ticket pass.

--- django/db/models/query.py   2015-06-17 17:29:38.000000000 -0400
+++ django/db/models/query.py   2015-06-17 17:31:26.000000000 -0400
@@ -370,13 +370,13 @@
         # this by using RETURNING clause for the insert query. We're punting
         # on these for now because they are relatively rare cases.
         assert batch_size is None or batch_size > 0
-        if self.model._meta.parents:
+        if self.model._meta.parents and not self.model._meta.proxy:
             raise ValueError("Can't bulk create an inherited model")
         if not objs:
             return objs
         self._for_write = True
         connection = connections[self.db]
-        fields = self.model._meta.local_concrete_fields
+        fields = self.model._meta.concrete_fields
         objs = list(objs)
         self._populate_pk_values(objs)
         with transaction.atomic(using=self.db, savepoint=False):
Last edited 9 years ago by William Schwartz (previous) (diff)

comment:3 by Anssi Kääriäinen, 9 years ago

One possible approach is that we change bulk_create()'s approach to be "save the objects to the database using fastest method available". We could add a couple of flags, too, for example signals=True/False and with_pk=True/False. The defaults would be False and False.

Then we would check if the given database can do a fast bulk-insert, or if we need to switch to a loop with obj.save(force_insert=True).

I guess the biggest problem is that it might be a bit surprising if bulk_create doesn't do the insert in bulk. But even then, the current alternative is to do the insert by a loop of save() calls, and that is exactly what you will get from the updated bulk_create.

For example in fixture loading it would be very convenient if one could always call bulk_create(), and get the fastest possible bulk insert for the given model class and database backend combination.

comment:4 by William Schwartz, 9 years ago

I think that as a user, I would be surprised if bulk_create did not execute in single (or very small number of, depending on batch_size) INSERTs. I can imagine stackoverflow threads asking "How to force bulk_create to use single insert," with answers to the effect of, "Back in the day, it always executed in a single INSERT but now it's a bit of a guessing game". I'm not saying that it would be a guessing game, I'm just saying that having bulk_create decide what to do will feel opaque to users. The current way it works has the advantage of being fully explicit.

Further, I'm not sure I see such an expansion of bulk_create's duties as being germane to allowing proxy models to use the method, except if your goal is also to allow multi-table models to use it. But the latter consideration seems like it should be a separate ticket.

comment:5 by William Schwartz, 9 years ago

Owner: changed from nobody to William Schwartz
Status: newassigned

comment:6 by William Schwartz, 9 years ago

Work ongoing at https://github.com/wkschwartz/django/tree/ticket_24997.

I've created a "practice" pull request at https://github.com/wkschwartz/django/pull/1. Before submitting a real pull request to Django, I'll squash my commits.

comment:7 by William Schwartz, 9 years ago

I have also created a forward port for Django 1.9 (based on the master branch) at https://github.com/wkschwartz/django/tree/ticket_24997_19. (It's simply a cherry-pick from the other branch.) The documentation in this version of the patch still shows the bug being fixed in Django 1.8.3. The purpose of this version of the patch is merely to make it easy to forward-port the fix.

I've created another "practice" pull request at https://github.com/wkschwartz/django/pull/2.

All commits are now squashed.

Tests pass on Sqlite and Postgresql.

Unless someone says otherwise beforehand, I'll submit both pull requests tomorrow morning.

comment:8 by Tim Graham, 9 years ago

Per our supported version policy, I don't think this would qualify for a backport to 1.8. I'd suggest to take a look at our patch review checklist before you submit the PR, but generally the patch looks okay.

comment:9 by William Schwartz, 9 years ago

Has patch: set

Pull request submitted: https://github.com/django/django/pull/4886

Note I've updated it since @timgraham looked at it this morning. It now correctly handles the case where a proxy model inherits from a multi-table model.

comment:10 by William Schwartz, 9 years ago

Version: 1.8master

comment:11 by Tim Graham, 9 years ago

Triage Stage: AcceptedReady for checkin

comment:12 by William Schwartz, 9 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.
Back to Top