#7596 closed New feature (fixed)
Multiple Database Inserts using 1 query
Reported by: | ErikW | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | database, create, multirow, feature |
Cc: | Trevor Caira, Alexander Koshelev, fnl, sfllaw@…, anssi.kaariainen@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
It would be really nice (ie convenient, reduce load on db, improve performance) to be able to create a "collection" of model objects with a single query. I've tested the raw SQL, and it is significantly faster than using a loop to insert a bunch of database rows. All of the backends seem to support it.
I'm thinking something like
[queryset] = Model.objects.bulk_create([{object}, {object}, {object}])
http://www.desilva.biz/mysql/insert.html
http://www.postgresql.org/docs/8.2/interactive/sql-insert.html (search for: To insert multiple rows using the multirow VALUES syntax)
http://codingforums.com/archive/index.php?t-109802.html
Multirow inserts are not supported in sqlite yet (http://www.sqlite.org/cvstrac/wiki?p=UnsupportedSql), but the backend could fall back to a python loop in this case....
Thanks!
Attachments (3)
Change History (21)
comment:1 by , 16 years ago
Keywords: | feature added |
---|---|
milestone: | → post-1.0 |
Triage Stage: | Unreviewed → Design decision needed |
comment:2 by , 16 years ago
comment:3 by , 16 years ago
Cc: | added |
---|
comment:5 by , 16 years ago
Triage Stage: | Design decision needed → Accepted |
---|
comment:6 by , 16 years ago
Triage Stage: | Accepted → Design decision needed |
---|
Reverting anonymous change.
comment:7 by , 16 years ago
Quick question: A multiple db inserts using one query would achieve more of a speed of than just using transaction managent e.g. @transaction.commit_on_success?
Thanks
comment:8 by , 16 years ago
Cc: | added |
---|
comment:9 by , 15 years ago
Cc: | added |
---|
comment:10 by , 14 years ago
This came up just now in discussing optimizing the Django test suite. A lot of time in TransactionTestCases is wasted in model-by-model get_or_create of content-types and permissions. As it stands, the lookups there could be optimized, but to fully optimize it a bulk-insert ORM operation would also be needed.
comment:11 by , 14 years ago
Triage Stage: | Design decision needed → Accepted |
---|
The API here still requires design discussion, but accepting this in principle.
comment:12 by , 14 years ago
Cc: | added |
---|---|
Severity: | → Normal |
Type: | → Uncategorized |
comment:13 by , 14 years ago
Type: | Uncategorized → New feature |
---|
by , 13 years ago
Attachment: | t7596-alex.diff.txt added |
---|
Alex Gaynor's patch, prepared at DjangoCon Europe 2011
comment:14 by , 13 years ago
Easy pickings: | unset |
---|---|
Has patch: | set |
UI/UX: | unset |
by , 13 years ago
Attachment: | t7596-alex.diff added |
---|
Same patch, extension that trac will hopefully recognize properly
comment:15 by , 13 years ago
I've attached a reviewed/updated/tweaked version of Alex's patch.
Looks pretty good to me, mostly stylistic changes though I think the previous
patch would have had some issues with primary key values which evaluated to
False; I've updated this to explicitly test for None.
I'm not a database expert, so I may have missed some intricacies, but the
code itself looks robust.
I've also added a simple proof-of-concept change to the MySQL backend which
works as far as I've tested, but I usually steer as far away from MySQL as
possible so I'm not the person to write this. So far it's identical to
PostgreSQL; I don't think MySQL can do the SQLite trick for combining inserts
with/without auto-increment primary keys?
comment:16 by , 13 years ago
Cc: | added |
---|
Some notes on the approach.
Combining instances with and without autoincrement primary keys set, use this sql:
insert into table_with_serial_id(id, other_col) values(1, 'foo'),(default, 'bar');
A different question is if this is something that needs to be supported at all. I don't see it particularly useful to insert anything else than default values into serial columns. Except for bulk loading, but then you don't want to have None values in the mix. Maybe there is an use case?
Also, why doesn't the code use "RETURNING serial_pk" where available? Maybe that would make the situation more confusing, because sometimes the bulk_save method would include the PKs and sometimes not depending on used backend. Maybe a kwarg 'returning_autopk' and if that is set then raise error if the backend doesn't support RETURNING?
Maybe use something like sessions in SQLAlchemy?