Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#17788 closed Bug (fixed)

bulk_create() "too many SQL variables" error

Reported by: Alpár Jüttner Owned by: Anssi Kääriäinen
Component: Database layer (models, ORM) Version: 1.4
Severity: Release blocker Keywords:
Cc: anssi.kaariainen@…, michal.petrucha@…, alasdair@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I tried to bulk_create() lots of (e.g. >300) objects with Django 1.4b1 + sqlite backend. It fails with

django.db.utils.DatabaseError: too many SQL variables

It makes the use of bulk_create() quite risky.

Wouldn't is be possible the automatically break up the list of objects to smaller pieces when it turns out to be too large?

Attachments (3)

batch_bulk_insert.diff (9.2 KB ) - added by Anssi Kääriäinen 12 years ago.
17788-warning-for-1.4.diff (1.1 KB ) - added by Aymeric Augustin 12 years ago.
ticket_17788.diff (13.1 KB ) - added by Anssi Kääriäinen 12 years ago.

Download all attachments as: .zip

Change History (62)

comment:1 by Anssi Kääriäinen, 12 years ago

Cc: anssi.kaariainen@… added
Triage Stage: UnreviewedDesign decision needed

Yes, it would be possible. The biggest problem is actually knowing how many parameters the backend happens to support. For most backends the limit isn't variable amount, it is the query string length. Because of this it is impossible to make the bulk_insert immune to all batch size errors. (No, calculating the query string length isn't a sane option).

The way forward would be defining a max_variables_per_query for each backend. For sqlite3 it is 1000, for other backends it could be None for no limit. There could also be a kwarg "batch_size" by which you could override the default batch size given by the backend.

comment:2 by Ramiro Morales, 12 years ago

Triage Stage: Design decision neededAccepted

See also #16426 for a problem with the same root reason in a feature introduced in 1.3.

in reply to:  1 comment:3 by Alpár Jüttner, 12 years ago

Replying to akaariai:

Yes, it would be possible. The biggest problem is actually knowing how many parameters the backend happens to support. For most backends the limit isn't variable amount, it is the query string length. Because of this it is impossible to make the bulk_insert immune to all batch size errors. (No, calculating the query string length isn't a sane option).

What if we always break up the list to bulks consisting of no more than e.g. 20 elements each? This should be safe for all backends. Does it have any noticeable performance penalty?

comment:4 by Anssi Kääriäinen, 12 years ago

I would implement the above in a way where every backend has 1000 parameter limit. If you are inserting 2-column rows (m2m tables for example), you could insert 499 items in one query. If they were 20 column items, then you would get to insert 49 and so on. 20 objects per call is a little low anyhow.

The above is a bit theoretical. I don't think that change will get accepted for 1.4, and after that it will be backwards incompatible change.

The best approach is to set a 1000 parameter limit for SQLite, no limit for others, and allow override of the default batch size by a kwarg.

I will ask postgresql's performance mailing list if they have some suggestions for preferred batch size. If it is likely the performance will crash if you do too large batch then having a default avoiding this would be good. I know doing a WHERE id IN large_list isn't efficient, but I don't know if this is true for inserts.

I don't think there is any possibility to get the above into 1.4. There is just too little time to fine-tune all the details (transaction handling etc). In addition the above is a new feature.

EDIT: rewrote my original post. Me fail english when too little caffeine in the system...

Last edited 12 years ago by Anssi Kääriäinen (previous) (diff)

in reply to:  4 ; comment:5 by Alpár Jüttner, 12 years ago

Replying to akaariai:

The above is a bit theoretical. I don't think that change will get accepted for 1.4, and after that it will be backwards incompatible change.

It would be very sad. I think bulk_create() is pretty much unusable (for being unsafe to use) right now.
I don't see any reasonable way of handling this issue at user level.
If nothing changes until the release, at least the following warning should be added the the doc of bulk_create().

WARNING. This function crashes if you try to create more than a certain number of objects. This number depends on the database backend, and also on the number of fields in the object. In short: don't use this function.

I don't think there is any possibility to get the above into 1.4. There is just too little time to fine-tune all the details (transaction handling etc). In addition the above is a new feature.

The solution you propose may be considered a new feature, but the problem itself is a bug. As far as the bulk_create() functionality is concerned, this is pretty much a critical one.

comment:6 by Anssi Kääriäinen, 12 years ago

Yes, it is a somewhat dangerous operation. It will seem to work in testing but crash in production. For that reason it would be good to have some (large) limits for other backends which have query size limits, too. MySQL has no practical limit (16MB by default), Oracle has a limit of 64K which you could actually hit if you try hard enough.

It is notable that this same problem is present when you do qs.filter(pk__in=[large_list]) and that one can not be fixed in any transparent way.

It could be possible to have a simple fix for sqlite3: just split the inserts so that they have less than 1000 parameters each. No changes for other backends, no ability to override the batch size. It would fix this ticket's immediate problem, and could perhaps get into 1.4.

I asked pgsql-performance if there is any wisdom about preferred batch sizes. I suspect there is no point at which the performance will predictably collapse, so there is no need for default batch size for PostgreSQL. But lets see if something pops up. The thread is this one: http://archives.postgresql.org/pgsql-performance/2012-02/msg00265.php

Maybe I should open another ticket for the bigger changes? That way this ticket could be dedicated to just fixing the SQLite3 problem.

comment:7 by Carl Meyer, 12 years ago

Talked this over briefly with Jacob in IRC - it's not a release blocker, since it seems likely that the answer "use a better database" suffices to prevent it being an actual problem in production (except maybe in Oracle? not clear to me). I'd still like to get it fixed for 1.4 if we can, and something like a bulk_insert_chunk_size on the db backend seems pretty reasonable.

comment:8 by Carl Meyer, 12 years ago

Severity: NormalRelease blocker

Actually, marking it a release blocker because at the very least it needs a doc fix before release; an actual fix is not a release blocker but would be nice. Thanks claudep for pointing this out.

comment:9 by Anssi Kääriäinen, 12 years ago

I am going to test if Oracle has any batch size limits.

For PostgreSQL there is no practical limit. I tested this and once you go to millions of objects you could OOM your server, but most likely you will OOM your Python process before that.

For what I know MySQL does not have any practical limit either. 16Mb of query string should be enough for everybody(tm).

Oracle is an open question, and then there are all the third party databases. It would maybe be worth to have an API of get_next_batch(param_count, params). For example sqlite could return min(1000/param_count, len(params)/param_count).

comment:10 by Anssi Kääriäinen, 12 years ago

OK, seems like I am not going to test this on Oracle. Installing Oracle XE is ridiculously hard, and now I have half-configures Oracle XE installed...

So, if somebody could test the following on Oralce:

class SomeModel(models.Model):
    f = models.CharField(max_length=100)
SomeModel.objects.bulk_create(SomeModel(f='a'*100) for i in range(0, 1000))

If that works, increase the 1000 to say 10000. If no problems, I guess Oracle is safe enough for 1.4.

comment:11 by Aymeric Augustin, 12 years ago

For the record here's the diff I used:

Index: tests/regressiontests/test_17788/__init__.py
===================================================================
Index: tests/regressiontests/test_17788/tests.py
===================================================================
--- tests/regressiontests/test_17788/tests.py	(revision 0)
+++ tests/regressiontests/test_17788/tests.py	(revision 0)
@@ -0,0 +1,8 @@
+from django.test import TestCase
+
+from .models import SomeModel
+
+class SomeTest(TestCase):
+
+    def test_bulk_insert(self):
+        SomeModel.objects.bulk_create(SomeModel(f='abc') for i in range(0, 10000))
Index: tests/regressiontests/test_17788/models.py
===================================================================
--- tests/regressiontests/test_17788/models.py	(revision 0)
+++ tests/regressiontests/test_17788/models.py	(revision 0)
@@ -0,0 +1,4 @@
+from django.db import models
+
+class SomeModel(models.Model):
+    f = models.CharField(max_length=100)

This test passes under Oracle.

comment:12 by Anssi Kääriäinen, 12 years ago

As the field length is just 3 chars the amount of data from the parameters is roughly 30KB. If you make the field length, say 30 chars, then it should trigger the problem if it exists at all.

I suspect it doesn't. I have googled this a little, and people do not complain about this often. If the parameters were calculated into the query string length, then I expect there would be more complaints.

comment:13 by anonymous, 12 years ago

I just ran Aymeric test inserting 10K SomeModel instances settings values of its f field to both a 30-char string and 300-char string without failures. Database is Oracle Database 10g Express Universal Edition 10.2.0.1 as per OracleTestSetup

comment:14 by Carl Meyer, 12 years ago

There is already a supports_1000_query_parameters db backend feature, set to False for SQLite and True for all other backends. That seems kind of odd, like it should be max_query_parameters = 1000 instead, but it was probably done that way so it could be used with skipUnlessDBFeature in the tests, which is the only place its currently used.

Whether we leave that feature as it is or change it to a number, it could be used to resolve both this and #16426. Though if SQLite is really the only affected backend, it's awfully tempting to just say "get a better database" rather than introducing a bunch of extra chunking complexity in both deletion and bulk insertion, just for SQLite's sake.

in reply to:  7 ; comment:15 by Alpár Jüttner, 12 years ago

Replying to carljm:

[...] since it seems likely that the answer "use a better database" suffices to prevent it being an actual problem in production [...]

My problem is that although we "use a better database" for production, but we use SQLite during the development, because it is easier to setup, to create and share snapshots etc.
I believe we are not alone with this development model.

If we accept that "correct behavior" should have priority over "performance", then SQLite, bulk_create() may simply create the object one-by-one. It is slow (on SQLite), but at least it works, and one may hope better performance on a "better database". So, it is already much better than having something unusable.

How does the batch size affects to performance? As far as I understand, there is an inevitable processing requirement for inserting data, plus the overhead of processing the SQL query (one-by-one). The feature bulk_create() is about to reduce the second one. Thus, if we use batch size as small as 10, then it already reduces the overhead to one tenth. Although I didn't tested it, but it makes me feel, that the performance gain can already be achieved using very small batches, Large batches will result in negligible further improvements.

Finally, using reasonable sized batches may be beneficial even on "better databases", as a huge single SQL insert command may cause problems (such as timeout) on heavily loaded systems.

comment:16 by Anssi Kääriäinen, 12 years ago

OK, so it seems safe to say Oracle doesn't suffer from this. You need 64K of SQL string without the parameters, and that will fit a lot of placeholders. I checked DB2 and SqlServer and they seems to be safe enough from this issue, too.

So, it seems the only problematic backend is SQLite with its 1000 parameters limit. Fixing bulk_create is not that hard, but I have no idea how hard it is to fix delete. If delete is hard, is it worth fixing bulk_create alone? If both are somewhat easy then I think this should be done. It will make testing a little bit easier. If this is fixed, there is at least one 2500 object creating test in queries which should be fixed, too. Makes the test runner a few seconds faster.

comment:17 by Anssi Kääriäinen, 12 years ago

Attached is a patch that fixes the bulk_create issue. I used get_batch_size which can inspect the fields and the objects in the query.

The patch also implements a default limit of 100000 objects in one batch. Dropping one zero from the end would be wise, or if you prefer no limit, thats OK too for me. It would be trivial to implement batch_size kwarg for bulk_create if that is wanted down the road.

In addition to the .delete() there is also prefetch_related which doesn't work if the inner sets are large enough (note: haven't actually tested). My opinion is that these aren't critical issues, but it would be nice to abstract the backend differences away if possible. However there is going to be always one issue: id__in queries can not be abstracted (splitting the query breaks sorting), so there will always be cases where SQLite doesn't work correctly.

I noticed one issue with the bulk_create feature which might need some attention. It uses some effort to allow mixing of pk set / pk not set objects in one batch. In my opinion that is downright dangerous to allow, as the pk field must be autofield for the insert to work. Saving self-defined values to an autofield is something to be avoided, as that can break the sequence. It is fine to do bulk inserts with autofield PK set. Assume the user knows what he is doing, maybe he is going to reset the PK sequence after the operation or something. But I can't see any valid use case for mixing PK set and PK not set objects in one batch. The last added test in the patch doesn't work on SQLite for the very reason that mixing things in this way isn't safe. So, my question is, should this be disallowed? After 1.4 is released that can't be done. This isn't anything critical IMHO, but it would be good to discourage unsafe coding practices.

by Anssi Kääriäinen, 12 years ago

Attachment: batch_bulk_insert.diff added

comment:18 by Aymeric Augustin, 12 years ago

Possibly a stupid question, but what if the model has a foreign key to itself? Isn't the chunking likely to break foreign key constraints?

Even though the patch is rather straightforward, I'm really wary of unexpected side effets, on the eve of a release candidate.

I suggest we document the problem as a know limitation of the SQLite backend in 1.4 and fix it properly later -- along with the other problems triggered by this limitation, like #16426.

comment:19 by Aymeric Augustin, 12 years ago

Owner: changed from nobody to Aymeric Augustin

comment:20 by Anssi Kääriäinen, 12 years ago

Good catch. It does indeed break self-referencing foreign keys. Although the breakage happens only when the batch needs to be splitted, and in that case you currently get an error anyways... Still, the pk set/not set mixed in one query could already give you problems with foreign keys.

Leaving all this as a 1.5 problem sounds like a good plan at this point.

in reply to:  5 ; comment:21 by Aymeric Augustin, 12 years ago

I'm taking care of adding a warning. It boils down to: "don't use SQLite if your application has requirements that exceed SQLite's limits."

For the record, I disagree strongly with the wording suggested in comment 5. We won't spread FUD like "this number depends on the database backend", nor cargo-cult advice like "don't use this function."

by Aymeric Augustin, 12 years ago

Attachment: 17788-warning-for-1.4.diff added

comment:22 by Anssi Kääriäinen, 12 years ago

Looks good to me.

comment:23 by Aymeric Augustin, 12 years ago

Severity: Release blockerNormal

comment:24 by Aymeric Augustin, 12 years ago

In [17653]:

Documented a limit of the SQLite backend, in relation with the bulk_create function. Refs #17788.

comment:25 by Aymeric Augustin, 12 years ago

Owner: changed from Aymeric Augustin to nobody

in reply to:  15 comment:26 by Karen Tracey, 12 years ago

Replying to alpar:

Replying to carljm:

[...] since it seems likely that the answer "use a better database" suffices to prevent it being an actual problem in production [...]

My problem is that although we "use a better database" for production, but we use SQLite during the development, because it is easier to setup, to create and share snapshots etc.

...and it will quietly accept plenty of error cases that your production database will choke on, thus hiding errors during development that will NOT be hidden in production. Get a better database, even in development.

in reply to:  21 comment:27 by Alpár Jüttner, 12 years ago

Replying to aaugustin:

For the record, I disagree strongly with the wording suggested in comment 5.

That wording was used only to emphasize that I found it a major issue. I didn't intended that text for actual inclusion in the doc, either. Sorry for not being clearer about it.

Replying to kmtracey:

My problem is that although we "use a better database" for production, but we use SQLite during the development, because it is easier to setup, to create and share snapshots etc.

...and it will quietly accept plenty of error cases that your production database will choke on, thus hiding errors during development that will NOT be hidden in production. Get a better database, even in development.

I don't think this is the right place for debating over development practices. Like it or not, many people use SQLite in some stages of development even when eventually a more capable database is used for production. And those people are affected by the issue.

comment:28 by anonymous, 12 years ago

'Nice' to read i'm not the only one, I just ran into this problem trying to play with latest stable django-cms in 1.4b1rev17666.
the "use a better database" solution (Postgres in our case) more or less solved the problem

comment:29 by Alpár Jüttner, 12 years ago

I don't think the following FUD/cargo cult like statement in [17653] is correct:

If your application's performance requirements exceed SQLite's limits, it's recommended to switch to another database engine, such as PostgreSQL.

  • The whole thing has nothing to do with "application's performance requirements". Creating 300 object at once it certainly not a performance requirement that SQLite is incapable to meet.
  • There is an obvious solution to the problem even using SQLite: splitting up the bulk to smaller pieces, either internally by Django or at the user level.

comment:30 by Alpár Jüttner, 12 years ago

So, I would like to implement my own safe version of bulk_create() at user level like this:

def safe_bulk_create(objs):
    BULK_SIZE = 900/FIELD_NUM
    for i in range(0,len(objs),BULK_SIZE):
        objs[0].__class__.objects.bulk_create(objs[i:i+BULK_SIZE])

This works well for me so far, both on SQLite, both on PostgreSQL, where it does not cause significant performance penalty compared to directly using bulk_create().

However, I couldn't figure out how to calculate the number of fields in the object. What is the recommended way to do that?

comment:31 by Simon Charette, 12 years ago

@alpar I think you can use the not so internal model options len(obj.__class__._meta.fields)

comment:32 by Michal Petrucha, 12 years ago

Cc: michal.petrucha@… added

I'll just chime in, we're about to hit this limit with the Django test suite. The more models the test suite contains, the more permissions are created in django.contrib.auth.management.create_permissions using bulk_insert. Currently on trunk this creates 996 objects; working on a development branch with some extra tests makes it impossible to run the test suite against SQLite.

comment:33 by Anssi Kääriäinen, 12 years ago

My opinion is that this should be fixed for SQLite. I am not too sure about the rest of the problematic operations (prefetch, delete etc), but this one is so simple to fix that it should be done. The price of having the possibility of self-referential foreign key errors on large batches is less than having the 1000 parameter limit for bulk_create. As said before, currently you will get the 1000 parameters error anyways in the problematic situations.

It seems that SQLite supports deferrable foreign keys. I wonder why those aren't used on SQLite when they are used on PostgreSQL and Oracle?

comment:34 by Dan McGee, 12 years ago

Hit this today with MySQL inserting ~19000 rows of a simple model with 5 fields, including the primary key. MySQL gave me this: http://dev.mysql.com/doc/refman/5.1/en/packet-too-large.html

I had to bump max_allowed_packet from 1M to 2M to get the insert to work; I can see bulk inserting a model that had large TextField() values being particularly troublesome. Note that 1M is the out of the box default with MySQL.

comment:35 by Anssi Kääriäinen, 12 years ago

I don't think we are going to have a default on MySQL. The reason is it is pretty much impossible to guess the correct batch size (2 objects might be too much). However, we could expose a batch_size parameter, and suggest users set that to something reasonably low.

In addition, can we please fix the SQLite "too many variables" error? That one hits me 9/10 of times when doing database performance testing. Is there opposition to adding batch_size parameter to bulk_create which defaults to unlimited, except for sqlite the default is set so that the variable limit is not hit. Yes, there could be problems for self-referential foreign keys. But that query is already doomed, as this would just trade "too many variables" to "foreign key broken" on the rare case there is actually a problem.

comment:36 by Anssi Kääriäinen, 12 years ago

Just as an additional data point for fixing this on SQLite: SQLite doesn't actually enforce foreign key constraints, so there is little risk of breaking self-referential foreign key constraints.

Any objections for the following plan?

  • Add a batch_size variable to bulk_create.
  • By default the batch_size is unlimited, but on SQLite it is restricted so that a single query does not get too many variables.

comment:37 by Aymeric Augustin, 12 years ago

This plan looks good to me.

comment:38 by cardonbj, 12 years ago

I agree with getting this fixed, and hopefully in an incremental to 1.4. My company *will not* migrate to Django 1.4 unless the test suite passes on SQLite (which we use for trunk development), which it currently does not. I have a very hard time believing that this is not relatively common among Django dev houses as sqlite is the simplest way to get your application going without worrying about setting up and configuring a more reliable database.

comment:39 by Aymeric Augustin, 12 years ago

You may want to point out to the people who set this rule that requiring the test suite to pass on a database you aren't using in production doesn't make a lot of sense :)

And yes, we'll backport the fix to 1.4.1.

comment:40 by Anssi Kääriäinen, 12 years ago

Owner: changed from nobody to Anssi Kääriäinen

I guess I should assign this to myself.

For 1.4.1 we can't add the batch_size variable (a new feature). So, would it make sense to add it for 1.5 and just use the default of "limited by variable amount for SQLite, otherwise unlimited" in 1.4.1 without the chance of overriding it?

comment:41 by Michal Petrucha, 12 years ago

Somehow I'm not convinced by the idea to add a parameter to bulk_create. I feel it is an attempt to fix this at a wrong level. From an API perspective, it seems wrong to me to hard-code a number specifying the batch size into the actual call to bulk_create since the appropriate number is completely backend-dependent.

I think it would make more sense for this to go into the backend-specific code, something like DatabaseOperations.max_batch_size that would take an InsertQuery object to calculate the value based on the number of fields inserted or something like that. There's a similar (albeit simpler) mechanism for __in lookups, DatabaseOperations.max_in_list_size.

But maybe it's just me abstracting things too much.

comment:42 by Anssi Kääriäinen, 12 years ago

The current plan is to add a batch_size argument, by which you can override the default for the used backend. The default is based on variable amount limit on SQLite, otherwise it is unlimited. (The current patch has a "large enough" default limit, but that will likely be changed to unlimited).

The only place where you really need the batch size argument is with MySQL. MySQL has a 1MB limit for query string, and I believe it will be too complicated to calculate the query string size using the given instances. So, the user can limit the size if he needs to, but there is no default limit in place, as we really don't know what the default should be.

On other core backends there are no practical query size limits, or at least I could not hit any limits.

Still, maybe it would be better to handle this at one level up from bulk_insert. So, you would do something like this:

for batch in connection.get_batches(objs):
    SomeModel.objects.bulk_create(batch)

This makes it very explicit what is happening. You could use different batch generators if needed. Every user is not forced to rewrite the batching logic himself.

I like the automatic batch size limit more, but it might be that I am a little fixated to that approach...

comment:43 by Michal Petrucha, 12 years ago

On second thought, I agree with the batch_size approach. As long as it works by default where it is possible.

The problem with connection.get_batches is that:

  1. you usually don't work with connection objects (at least with basic ORM usage, it's a bit different when you're using complicated custom SQL)
  2. it introduces boilerplate you'd have to add everywhere bulk_insert is used anyway...

So, all in all, I withdraw my previous objection.

comment:44 by Anssi Kääriäinen, 12 years ago

Has patch: set

I finally managed to update the patch. It is as close to committable as I managed to get it. The part I am unsure about is the documentation changes. Please verify. In addition to the batching there is one notable change: SQLite3's can_combine_inserts_with_and_without_auto_increment_pk was changed from True to False. The reason for this change is that when doing multiple batches (due to the parameter limit) things get weird when combining pk set/unset objects. If you have these PK values in your batches:

 5, None, 7, None
and
 9, None, 11, None

when the first batch is inserted you get PK values 1, 2, 5, 7 inserted, the second batch you will get 8, 9, 9, 11 because the PK field always starts from max(id) + 1... In short: this behavior is not wanted.

There is also https://github.com/akaariai/django/tree/ticket_17788 for those who prefer github.

by Anssi Kääriäinen, 12 years ago

Attachment: ticket_17788.diff added

comment:45 by Michal Petrucha, 12 years ago

Overall the patch looks good to me, only one remark: personally I'd use a non-recursive implementation of _batched_insert. Maybe it's just me just being excessively reluctant to use recursion when it can be rewritten 1:1 into a loop. Also, in extreme cases where the number of batches is high, the running time of this method will be O(n²) as a new copy of the whole list of remaining objects is created in each iteration.

Anyway, I don't know how much of an issue these concerns are.

comment:46 by Anssi Kääriäinen, 12 years ago

I have updated the patch to remove the recursion from _batched_insert: https://github.com/akaariai/django/compare/ticket_17788

comment:47 by Michal Petrucha, 12 years ago

Triage Stage: AcceptedReady for checkin

LGTM, marking RFC.

comment:48 by Claude Paroz, 12 years ago

s/valuer/value/ in bulk_batch_size docstring. In bulk_batch_size for SQLite3, I would rather use the // operator (in Python 3, '/' would return a float). I would also say in the docstring that 999 is a compile-time default in SQLite (SQLITE_LIMIT_VARIABLE_NUMBER).

comment:49 by Anssi Kääriäinen, 12 years ago

Corrected the review issues in: https://github.com/akaariai/django/commit/332fb9308c39249cdd38a034a3565778636c373b with some other improvements, too. See the commit message for details.

BTW I think this way of Git usage could be better than producing constantly rebased versions - it should be easier to see what changes have been made after the review. It is another issue if the review-fix commits should be merged or squashed when committing to the master repo. IMO squashed into one commit...

comment:50 by Claude Paroz, 12 years ago

Thanks for the improvements. Please squash, unless the individual commits have real values (for example to distinguish format editing and real code).

comment:51 by alasdair@…, 12 years ago

Cc: alasdair@… added

comment:52 by Anssi Kääriäinen <akaariai@…>, 12 years ago

Resolution: fixed
Status: newclosed

In [29132ebdef0e0b9c09e456b05f0e6a22f1106a4f]:

Fixed #17788 -- Added batch_size argument to qs.bulk_create()

The qs.bulk_create() method did not work with large batches together
with SQLite3. This commit adds a way to split the bulk into smaller
batches. The default batch size is unlimited except for SQLite3 where
the batch size is limited to 999 SQL parameters per batch.

Thanks to everybody who participated in the discussions at Trac.

comment:53 by Anssi Kääriäinen, 12 years ago

Resolution: fixed
Severity: NormalRelease blocker
Status: closedreopened
Version: 1.4-beta-11.4

I have created a backpatch of the committed fix in https://github.com/akaariai/django/tree/ticket_17788_backpatch.

I will wait for last comments about the backpatch before proceeding with the backpatch. I am little less enthusiastic of backpatching this than before, but I think I am still going to do it as that was the decision some time ago.

comment:54 by Tim Graham, 12 years ago

I'd be +1 to backporting since I currently have to apply this to each 1.4.X release to make things work for my installation (we have a lot of permissions and use SQLite for testing so the tests don't run without this patch).

comment:55 by Anssi Kääriäinen, 12 years ago

No - votes and some + votes. I will try to commit this to 1.4 during the weekend.

comment:56 by Russell Keith-Magee, 12 years ago

For the record, +1 on a backport from me, too. If you're looking for a procedural reason to justify this -- "Major problem with new feature" is one of the criteria for a backport; if we'd had this patch prior to the 1.4 release, I'd warrant we would have held up the 1.4 release to make sure it got in.

comment:57 by Anssi Kääriäinen <akaariai@…>, 12 years ago

Resolution: fixed
Status: reopenedclosed

In [23268608512444de626e63dee1815fb5b5207a48]:

[1.4.x] Fixed #17788 -- Added batch_size argument to qs.bulk_create()

The qs.bulk_create() method did not work with large batches together
with SQLite3. This commit adds a way to split the bulk into smaller
batches. The default batch size is unlimited except for SQLite3 where
the batch size is limited to 999 SQL parameters per batch.

Thanks to everybody who participated in the discussions at Trac.

Backpatch of 29132ebdef0e0b9c09e456b05f0e6a22f1106a4f from master (with
documentation changes removed).

comment:58 by Claude Paroz, 12 years ago

Worth a note in docs/releases/1.4.2.txt?

comment:59 by Anssi Kääriäinen, 12 years ago

I was wondering about the release notes myself when committing. I looked through the notes of older releases and it seemed most commits to back branches are not mentioned in the notes. So, I left the note out.

It might be a good idea to add this, and any non-trivial changes to the release notes of back-branches. Opinions?

Note: See TracTickets for help on using tickets.
Back to Top