Opened 16 years ago

Closed 13 years ago

Last modified 12 years ago

#7596 closed New feature (fixed)

Multiple Database Inserts using 1 query

Reported by: ErikW Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: database, create, multirow, feature
Cc: Trevor Caira, Alexander Koshelev, fnl, sfllaw@…, anssi.kaariainen@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

It would be really nice (ie convenient, reduce load on db, improve performance) to be able to create a "collection" of model objects with a single query. I've tested the raw SQL, and it is significantly faster than using a loop to insert a bunch of database rows. All of the backends seem to support it.

I'm thinking something like
[queryset] = Model.objects.bulk_create([{object}, {object}, {object}])

http://www.desilva.biz/mysql/insert.html
http://www.postgresql.org/docs/8.2/interactive/sql-insert.html (search for: To insert multiple rows using the multirow VALUES syntax)
http://codingforums.com/archive/index.php?t-109802.html

Multirow inserts are not supported in sqlite yet (http://www.sqlite.org/cvstrac/wiki?p=UnsupportedSql), but the backend could fall back to a python loop in this case....

Thanks!

Attachments (3)

t7596-alex.diff.txt (24.5 KB ) - added by Russell Keith-Magee 13 years ago.
Alex Gaynor's patch, prepared at DjangoCon Europe 2011
t7596-alex.diff (24.5 KB ) - added by Karen Tracey 13 years ago.
Same patch, extension that trac will hopefully recognize properly
7596.diff (26.6 KB ) - added by Simon Meers 13 years ago.
Updated/tweaked version of Alex's patch

Download all attachments as: .zip

Change History (21)

comment:1 by Johannes Dollinger, 16 years ago

Keywords: feature added
milestone: post-1.0
Triage Stage: UnreviewedDesign decision needed

comment:2 by goodrone, 16 years ago

Maybe use something like sessions in SQLAlchemy?

comment:3 by Trevor Caira, 15 years ago

Cc: Trevor Caira added

comment:4 by (none), 15 years ago

milestone: post-1.0

Milestone post-1.0 deleted

comment:5 by anonymous, 15 years ago

Triage Stage: Design decision neededAccepted

comment:6 by Russell Keith-Magee, 15 years ago

Triage Stage: AcceptedDesign decision needed

Reverting anonymous change.

comment:7 by vkailas, 15 years ago

Quick question: A multiple db inserts using one query would achieve more of a speed of than just using transaction managent e.g. @transaction.commit_on_success?

Thanks

comment:8 by Alexander Koshelev, 15 years ago

Cc: Alexander Koshelev added

comment:9 by fnl, 14 years ago

Cc: fnl added

comment:10 by Carl Meyer, 13 years ago

This came up just now in discussing optimizing the Django test suite. A lot of time in TransactionTestCases is wasted in model-by-model get_or_create of content-types and permissions. As it stands, the lookups there could be optimized, but to fully optimize it a bulk-insert ORM operation would also be needed.

comment:11 by Carl Meyer, 13 years ago

Triage Stage: Design decision neededAccepted

The API here still requires design discussion, but accepting this in principle.

comment:12 by Simon Law, 13 years ago

Cc: sfllaw@… added
Severity: Normal
Type: Uncategorized

comment:13 by Luke Plant, 13 years ago

Type: UncategorizedNew feature

by Russell Keith-Magee, 13 years ago

Attachment: t7596-alex.diff.txt added

Alex Gaynor's patch, prepared at DjangoCon Europe 2011

comment:14 by Russell Keith-Magee, 13 years ago

Easy pickings: unset
Has patch: set
UI/UX: unset

by Karen Tracey, 13 years ago

Attachment: t7596-alex.diff added

Same patch, extension that trac will hopefully recognize properly

by Simon Meers, 13 years ago

Attachment: 7596.diff added

Updated/tweaked version of Alex's patch

comment:15 by Simon Meers, 13 years ago

I've attached a reviewed/updated/tweaked version of Alex's patch.


Looks pretty good to me, mostly stylistic changes though I think the previous
patch would have had some issues with primary key values which evaluated to
False; I've updated this to explicitly test for None.


I'm not a database expert, so I may have missed some intricacies, but the
code itself looks robust.


I've also added a simple proof-of-concept change to the MySQL backend which
works as far as I've tested, but I usually steer as far away from MySQL as
possible so I'm not the person to write this. So far it's identical to
PostgreSQL; I don't think MySQL can do the SQLite trick for combining inserts
with/without auto-increment primary keys?

comment:16 by Anssi Kääriäinen, 13 years ago

Cc: anssi.kaariainen@… added

Some notes on the approach.

Combining instances with and without autoincrement primary keys set, use this sql:

    insert into table_with_serial_id(id, other_col) values(1, 'foo'),(default, 'bar');

A different question is if this is something that needs to be supported at all. I don't see it particularly useful to insert anything else than default values into serial columns. Except for bulk loading, but then you don't want to have None values in the mix. Maybe there is an use case?

Also, why doesn't the code use "RETURNING serial_pk" where available? Maybe that would make the situation more confusing, because sometimes the bulk_save method would include the PKs and sometimes not depending on used backend. Maybe a kwarg 'returning_autopk' and if that is set then raise error if the backend doesn't support RETURNING?

comment:17 by Alex Gaynor, 13 years ago

Resolution: fixed
Status: newclosed

In [16739]:

Fixed #7596. Added Model.objects.bulk_create, and make use of it in several places. This provides a performance benefit when inserting multiple objects. THanks to Russ for the review, and Simon Meers for the MySQl implementation.

comment:18 by Ramiro Morales, 13 years ago

In [16819]:

Added an implementation of bulk insert via the ORM to the Oracle DB backend.

Refs #7596, r16739 and http://troels.arvin.dk/db/rdbms/#insert-multiple

Note: See TracTickets for help on using tickets.
Back to Top