#23646 closed New feature (fixed)
Add QuerySet.bulk_update() to to efficiently update many models
Reported by: | Brillgen Developers | Owned by: | Tom Forbes |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.9 |
Severity: | Normal | Keywords: | |
Cc: | Tom Forbes | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Update
django-bulk-upate (https://github.com/aykut/django-bulk-update) which could/should be integrated into django core since we have a bulk_create and this corresponds well with the save (could be called bulk save if it makes sense). App works with an identical api to bulk_create and so should be a good candidate to be considered for inclusion in core.
Database backends like Postgresql support doing multiple value update for the same field in different rows using a single update query. Please see this sample below:
http://stackoverflow.com/questions/18797608/update-multiple-rows-in-same-query-using-postgresql
Example:
Currently:
Books.objects.all().update(price=10)
Proposed (syntax can vary):
Books.objects.all().update({'pk': 1, 'price': 10}, {'pk': 2, 'price': 25}])
Idea is to do it in sql for performance reasons but still use the ORM and not use a raw database connection.
However, such an interface is not exposed via the Django ORM. Would this be accepted as a patch? if so, would a list dictionary with the various fields:values to be updated be a good way to provide the input to ORM
Change History (18)
comment:1 by , 10 years ago
Description: | modified (diff) |
---|
comment:2 by , 10 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:3 by , 10 years ago
On an old project, ~10 years ago and not Django based, we used a "bulk update" procedure that did:
1) Bulk-insert the new records into a temporary table
2) update in one statement using a join (SQL Server lets you do that)
On hundreds of records, this was significantly faster than updating one-by-one (I no longer have access to any hard data, and it's irrelevant anyways).
Anyway, I suspect a better API would involve passing a collection of objects and selecting just the fields to update: Book.objects.update_many(books, 'price')
comment:4 by , 9 years ago
Resolution: | wontfix |
---|---|
Status: | closed → new |
Reopening even though it was closed by a core team member because another core member has reported the significant performance improvement.
Also, there is an app called django-bulk-upate (https://github.com/aykut/django-bulk-update) which could/should be integrated into django core since we have a bulk_create and this corresponds well with the save (could be called bulk save if it makes sense). App works with an identical api to bulk_create and so should be a good candidate to be considered for inclusion in core.
comment:5 by , 9 years ago
Description: | modified (diff) |
---|---|
Summary: | query set sql update to change different values by different keys → query set sql update to change different values by different keys (include django-bulk-update in core) |
Version: | 1.7 → 1.9 |
comment:6 by , 9 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Tentatively accepting, pending discussion on the DevelopersMailingList to approve the idea and API.
comment:8 by , 7 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Sorry for the duplicate, I'm not sure how my search missed this. I've got a patch that implements this at a basic level, there are some small optimizations that could be done at a later stage (including adding Postgres specific syntax).
comment:9 by , 6 years ago
Patch needs improvement: | set |
---|---|
Summary: | query set sql update to change different values by different keys (include django-bulk-update in core) → Add QuerySet.bulk_save() to to efficiently update many models |
comment:10 by , 6 years ago
Patch needs improvement: | unset |
---|
I've made the changes requested by the review. I also expanded the tests a fair bit and ran into one case where setting a model field to an expression (e.g F()
or a Function
) would fail, because the field values where passed to Value()
.
This got me thinking, what's the 'correct' result of Value(F('id'))
, or Value(Lower('field'))
? Currently it fails in weird ways when resolved with an Expression
, but should it perhaps return the resolved Expression
instead? Or should we error loudly in this case?
comment:11 by , 6 years ago
Patch needs improvement: | set |
---|
The documentation says, "A Value() object represents the smallest possible component of an expression: a simple value. When you need to represent the value of an integer, boolean, or string within an expression, you can wrap that value within a Value()." I don't think Value()
is meant to wrap other expressions.
comment:12 by , 6 years ago
Patch needs improvement: | unset |
---|
I've moved the tests around as requested and changed the documentation.
I also had an idea for a potential optimization by using simple case statements (https://code.djangoproject.com/ticket/29710) after reading the Oracle documentation where they make a clear distinction between the two (simple or searched).
follow-up: 14 comment:13 by , 6 years ago
Summary: | Add QuerySet.bulk_save() to to efficiently update many models → Add QuerySet.bulk_update() to to efficiently update many models |
---|
As mentioned on the PR I think it'd be worth investigating (in a follow up ticket) if we could push the primary conditional update logic to the compiler level (UpdateQuery
/SQLUpdateCompiler
) in order to use VALUES
instead of CASE(WHEN))
on backends that support it. Given this API is meant to be used for large number of objects I assume using VALUES
would perform significantly better because it'd be easier for query planners to inspect the query.
FWIW UpdateQuery
already has an update_batch
method used by model deletion but it doesn't support per-primary key values.
comment:14 by , 6 years ago
Replying to Simon Charette:
As mentioned on the PR I think it'd be worth investigating (in a follow up ticket) if we could push the primary conditional update logic to the compiler level (
UpdateQuery
/SQLUpdateCompiler
) in order to useVALUES
instead ofCASE(WHEN))
on backends that support it. Given this API is meant to be used for large number of objects I assume usingVALUES
would perform significantly better because it'd be easier for query planners to inspect the query.
FWIW
UpdateQuery
already has anupdate_batch
method used by model deletion but it doesn't support per-primary key values.
Yep, in #29037 I noted that Postgres has specific syntax for it and I was going to open a ticket once (or if!) this is merged. I'm not sure what other databases support this, and I spent today looking at how to implement it. It requires a custom join on VALUES
which I'm really not sure how to implement. If you have any idea I'd love a point in the right direction.
follow-up: 17 comment:15 by , 6 years ago
It requires a custom join on VALUES which I'm really not sure how to implement. If you have any idea I'd love a point in the right direction.
Hmm I'll have to dig into it a bit more as well but I was hoping it would be possible without introducing a new type of JOIN by having SQLUpdateCompiler
do the heavy lifting.
I suppose having a look at how QuerySet.extra(tables, where)
is implemented could guide us towards implementing it at the UpdateQuery
level though some form of JOIN.
From my local tests it seems like we could use UNION ALL
to replace VALUES
on MySQL at least.
UPDATE table, ( SELECT 1 id, 2 first, 3 second UNION ALL SELECT 2, 3, 4 UNION ALL ... ) AS table_update SET table.first = table_update.first, table.second = table_update.second WHERE table.id IN (...) AND table.id = table_update.id
I haven't tested performance on SQLite and MySQL but the query seem to perform significantly faster (3-5x) on PostgreSQL using VALUES()
instead of CASE(WHEN)
when updating two columns on 1k rows of a table with 10k rows. The performance difference was getting larger as more columns were updated (I tried 1 to 5).
comment:17 by , 6 years ago
Thank you for this! I've added a ticket to support this: https://code.djangoproject.com/ticket/29771#ticket
Replying to Simon Charette:
It requires a custom join on VALUES which I'm really not sure how to implement. If you have any idea I'd love a point in the right direction.
Hmm I'll have to dig into it a bit more as well but I was hoping it would be possible without introducing a new type of JOIN by having
SQLUpdateCompiler
do the heavy lifting.
I suppose having a look at how
QuerySet.extra(tables, where)
is implemented could guide us towards implementing it at theUpdateQuery
level though some form of JOIN.
From my local tests it seems like we could use
UNION ALL
to replaceVALUES
on MySQL at least.
UPDATE table, ( SELECT 1 id, 2 first, 3 second UNION ALL SELECT 2, 3, 4 UNION ALL ... ) AS table_update SET table.first = table_update.first, table.second = table_update.second WHERE table.id IN (...) AND table.id = table_update.idI haven't tested performance on SQLite and MySQL but the query seem to perform significantly faster (3-5x) on PostgreSQL using
VALUES()
instead ofCASE(WHEN)
when updating two columns on 1k rows of a table with 10k rows. The performance difference was getting larger as more columns were updated (I tried 1 to 5).
The link you've provided suggests that the ORM example you've provided would be interpreted as:
So... what in the ORM statement identifies which column is to be used in the
WHERE
clause, and which is to be updated? As currently expressed, it's ambiguous. You'd need to modify the statement to be something like:which nominates which values in the data are for the select.
However, even with those changes, I'm not sure I see why doing this as a single statement version would be preferable to:
AFAIK, it's no faster on the database side (although feel free to prove me wrong on that), and I don't find a "one liner API" especially expressive or clear for this feature.
For that reason, I'm marking this wontfix. If you feel the idea is still worth pursuing, feel free to start a discussion on django-developers and make your case. (I know you started one on django-users, but in this case, you need the eyeballs from -dev).