Code

Opened 6 years ago

Closed 3 years ago

Last modified 3 years ago

#7596 closed New feature (fixed)

Multiple Database Inserts using 1 query

Reported by: erikcw Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: database, create, multirow, feature
Cc: trevor, alexkoshelev, fnl, sfllaw@…, anssi.kaariainen@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

It would be really nice (ie convenient, reduce load on db, improve performance) to be able to create a "collection" of model objects with a single query. I've tested the raw SQL, and it is significantly faster than using a loop to insert a bunch of database rows. All of the backends seem to support it.

I'm thinking something like
[queryset] = Model.objects.bulk_create([{object}, {object}, {object}])

http://www.desilva.biz/mysql/insert.html
http://www.postgresql.org/docs/8.2/interactive/sql-insert.html (search for: To insert multiple rows using the multirow VALUES syntax)
http://codingforums.com/archive/index.php?t-109802.html

Multirow inserts are not supported in sqlite yet (http://www.sqlite.org/cvstrac/wiki?p=UnsupportedSql), but the backend could fall back to a python loop in this case....

Thanks!

Attachments (3)

t7596-alex.diff.txt (24.5 KB) - added by russellm 3 years ago.
Alex Gaynor's patch, prepared at DjangoCon Europe 2011
t7596-alex.diff (24.5 KB) - added by kmtracey 3 years ago.
Same patch, extension that trac will hopefully recognize properly
7596.diff (26.6 KB) - added by DrMeers 3 years ago.
Updated/tweaked version of Alex's patch

Download all attachments as: .zip

Change History (21)

comment:1 Changed 6 years ago by emulbreh

  • Keywords multirow, feature added; multirow removed
  • milestone set to post-1.0
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

comment:2 Changed 6 years ago by goodrone

Maybe use something like sessions in SQLAlchemy?

comment:3 Changed 5 years ago by trevor

  • Cc trevor added

comment:4 Changed 5 years ago by anonymous

  • milestone post-1.0 deleted

Milestone post-1.0 deleted

comment:5 Changed 5 years ago by anonymous

  • Triage Stage changed from Design decision needed to Accepted

comment:6 Changed 5 years ago by russellm

  • Triage Stage changed from Accepted to Design decision needed

Reverting anonymous change.

comment:7 Changed 5 years ago by vkailas

Quick question: A multiple db inserts using one query would achieve more of a speed of than just using transaction managent e.g. @transaction.commit_on_success?

Thanks

comment:8 Changed 5 years ago by alexkoshelev

  • Cc alexkoshelev added

comment:9 Changed 4 years ago by fnl

  • Cc fnl added

comment:10 Changed 3 years ago by carljm

This came up just now in discussing optimizing the Django test suite. A lot of time in TransactionTestCases is wasted in model-by-model get_or_create of content-types and permissions. As it stands, the lookups there could be optimized, but to fully optimize it a bulk-insert ORM operation would also be needed.

comment:11 Changed 3 years ago by carljm

  • Triage Stage changed from Design decision needed to Accepted

The API here still requires design discussion, but accepting this in principle.

comment:12 Changed 3 years ago by sfllaw

  • Cc sfllaw@… added
  • Severity set to Normal
  • Type set to Uncategorized

comment:13 Changed 3 years ago by lukeplant

  • Type changed from Uncategorized to New feature

Changed 3 years ago by russellm

Alex Gaynor's patch, prepared at DjangoCon Europe 2011

comment:14 Changed 3 years ago by russellm

  • Easy pickings unset
  • Has patch set
  • UI/UX unset

Changed 3 years ago by kmtracey

Same patch, extension that trac will hopefully recognize properly

Changed 3 years ago by DrMeers

Updated/tweaked version of Alex's patch

comment:15 Changed 3 years ago by DrMeers

I've attached a reviewed/updated/tweaked version of Alex's patch.


Looks pretty good to me, mostly stylistic changes though I think the previous
patch would have had some issues with primary key values which evaluated to
False; I've updated this to explicitly test for None.


I'm not a database expert, so I may have missed some intricacies, but the
code itself looks robust.


I've also added a simple proof-of-concept change to the MySQL backend which
works as far as I've tested, but I usually steer as far away from MySQL as
possible so I'm not the person to write this. So far it's identical to
PostgreSQL; I don't think MySQL can do the SQLite trick for combining inserts
with/without auto-increment primary keys?

comment:16 Changed 3 years ago by akaariai

  • Cc anssi.kaariainen@… added

Some notes on the approach.

Combining instances with and without autoincrement primary keys set, use this sql:

    insert into table_with_serial_id(id, other_col) values(1, 'foo'),(default, 'bar');

A different question is if this is something that needs to be supported at all. I don't see it particularly useful to insert anything else than default values into serial columns. Except for bulk loading, but then you don't want to have None values in the mix. Maybe there is an use case?

Also, why doesn't the code use "RETURNING serial_pk" where available? Maybe that would make the situation more confusing, because sometimes the bulk_save method would include the PKs and sometimes not depending on used backend. Maybe a kwarg 'returning_autopk' and if that is set then raise error if the backend doesn't support RETURNING?

comment:17 Changed 3 years ago by Alex

  • Resolution set to fixed
  • Status changed from new to closed

In [16739]:

Fixed #7596. Added Model.objects.bulk_create, and make use of it in several places. This provides a performance benefit when inserting multiple objects. THanks to Russ for the review, and Simon Meers for the MySQl implementation.

comment:18 Changed 3 years ago by ramiro

In [16819]:

Added an implementation of bulk insert via the ORM to the Oracle DB backend.

Refs #7596, r16739 and http://troels.arvin.dk/db/rdbms/#insert-multiple

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.