#20429 closed New feature (fixed)
Add QuerySet.update_or_create method
Reported by: | Evan Cofsky | Owned by: | Karol Sikora |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | timograham@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
It seems handy to have a function that would update an existing record in the database with the values of defaults if one is found, instead of just returning a flag to the caller like get_or_create does. We have a codebase using this, but it may also be useful to the world at large. The attached implementation makes a passing effort at not updating too much of the model, but should probably actually restrict itself to just fields and related objects instead of any attribute.
Attachments (1)
Change History (33)
by , 12 years ago
comment:1 by , 12 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
Triage Stage: | Unreviewed → Accepted |
Upsert is a fairly common idea; seems worthwhile to me.
Regarding implementation: there's a lot of common ground between get_or_create and update_or_create, so I'm wondering if there should be some common base implementation, rather than layering one over the other.
comment:2 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 12 years ago
I hadn't thought about that. That's a good idea I'd be willing to look into if needed.
comment:5 by , 12 years ago
Should we at all be concerned in the update code about limiting updated attributes to just fields? It seems like it could potentially allow some subtle and difficult to identify bugs if any updates are allowed.
comment:6 by , 12 years ago
Yes, You have right. I'll add checking if updated fields are model attributes.
comment:7 by , 12 years ago
I've changed approach according to HonzaKral proposition, currently utilizing update method on QuerySet.
comment:8 by , 12 years ago
Using update()
won't call save
on the created or updated object (https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.update), which is slightly different from the get_or_create()
API (https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create). The difference may be something we should keep in mind when deciding on the final implementation. My preference is that they behave the same in this regard (i.e. save()
is called with both functions).
comment:9 by , 12 years ago
Yes, i know. I will talk also tomorrow with other developers on sprints about this approach.
Maybe the best way would be to choose through some kwarg to use update or save?
But using update is more consistent looking to name this method: update_or_create. update on first position suggests that its following to update approach.
comment:10 by , 12 years ago
Maybe calling it update_or_create
is misleading, too, and calling it something like upsert
is better. This would avoid the name collision with update
on QuerySet
, and then we can follow the semantics of get_or_create
more closely. But I do think following the semantics of get_or_create
are more important than other considerations. Callers will either expect save
to have been called after updating to perform application-level consistency checks, or people will need to remember to call save
after each update_or_create
call, and being safe by default is important to me.
comment:11 by , 12 years ago
Here is django-developers thread: https://groups.google.com/forum/?fromgroups#!topic/django-developers/cE0oTNMZaKU
comment:12 by , 12 years ago
After short talk with @russellm decision is to follow get_or_create approach with calling save on model.
I'll update pull request shortly.
comment:14 by , 12 years ago
Review notes on the pull request.
Also - when submitting a patch that you want reviewed, it helps if you uncheck the "Patch needs improvement" flag. The list of patches needing review is essentially the list with a patch, that doesn't need improvement, and doesn't need tests or docs. If you don't update the flags, your patch will likely end up being missed for review.
comment:15 by , 12 years ago
Patch needs improvement: | unset |
---|
comment:16 by , 12 years ago
@russelm, waitng for Your second review, i've pushed new commit with updates according Your sugessions(and also uncheck "Patch needs improvement" flag).
comment:17 by , 12 years ago
Patch needs improvement: | set |
---|
Is the current patch still at: https://github.com/django/django/pull/1132/files ?
Probably by accident documentation of the first and last method have been removed.
comment:18 by , 12 years ago
Cc: | added |
---|---|
Needs documentation: | unset |
Needs tests: | unset |
Patch needs improvement: | unset |
Summary: | Implementation of update_or_create → Add QuerySet.update_or_create method |
Updated pull request to address outstanding issues: https://github.com/django/django/pull/1248
comment:19 by , 12 years ago
Patch needs improvement: | set |
---|
Some backends like mysql can handle upsert natively, update_or_create should use that as advantage.
comment:20 by , 12 years ago
I imagine that would increase the complexity of the patch quite a bit. I wonder if we need that optimization for the first version of this? Is the anonymous commenter willing to implement it?
comment:21 by , 12 years ago
Patch needs improvement: | unset |
---|
As suggested in IRC, I used djangobench to test this change, but I didn't get any conclusive results (this is my first time using djangobench, so I'm not certain how to interpret the results). Sometimes djangobench report a "significant" difference (both slower and faster), other times there was no difference. Here are a couple results. I'm assuming the difference is negligible enough to ignore.
$ djangobench --control=master --experiment=ticket_20429 query_get_or_create -t 500 Running 'query_get_or_create' benchmark ... Min: 0.000931 -> 0.000944: 1.0141x slower Avg: 0.000997 -> 0.000979: 1.0187x faster Significant (t=3.474695) Stddev: 0.00011 -> 0.00005: 2.1908x smaller (N = 500) Running 'query_get_or_create' benchmark ... Min: 0.000935 -> 0.000947: 1.0130x slower Avg: 0.000975 -> 0.001007: 1.0335x slower Significant (t=-7.464501) Stddev: 0.00006 -> 0.00008: 1.3357x larger (N = 500) Running 'query_get_or_create' benchmark ... Min: 0.000931 -> 0.000939: 1.0087x slower Avg: 0.000999 -> 0.001006: 1.0076x slower Not significant Stddev: 0.00014 -> 0.00013: 1.0402x smaller (N = 500)
comment:22 by , 12 years ago
As mentioned on IRC, I believe this method suffers from a race-condition (especially if used in a view) which should be documented.
Warning: This method is prone to race-condition and can result in multiple rows being inserted simultaneously if uniqueness is not enforced at the database level (see either ``unique`` or ``unique_together``). During a race where uniqueness is inforced, all simultaneous calls to ``create_or_update`` but one will raise an ``IntegrityError``. Without uniqueness inforcement, any ``create_or_update`` call with the same set of parameters that happens after a race will raise ``MultipleObjectsReturned``.
Like get_or_create
, it might be good to recommend it as a convenience for scripts rather than something to use in a view.
Refs #12579
comment:23 by , 12 years ago
If you're not enforcing uniqueness constraints in the database you're going to have this race condition no matter what. Perhaps the documentation around creating models and the basics of the tutorials could be updated to reflect this. Documenting it as something endemic only to these methods does nothing to improve the data consistency of Django applications and would rather just discourage people from using these methods which are basically the codification of the proper way to handle these cases along side proper database implementation.
comment:24 by , 12 years ago
The integrity issue is just a side effect of the race-condition, even if uniqueness is enforced at the database level, you still need to be aware of the potential 500 you will be getting.
I agree that any get()
followed by create()
in user code suffers from the same issue, but when you do it manually you have a chance to think about the potential problems. There is a layer of opacity when you use a queryset method like update_or_create()
and you may assume that all problems have been solved by the framework. Eventually you get to shoot yourself in the foot, which results in tickets like #12579.
update()
documents that it's free of race-condition, select_for_update()
is obviously free of race-condition on supported backends, it wouldn't be too hard to assume that update_or_create()
provides such guarantee as well.
I agree that the race-condition for these common patterns should be documented (if that's not already the case), but convenience methods like get_or_create()
and update_or_create()
also need to be clear about their implementation and potential consequences.
comment:25 by , 12 years ago
I've updated the patch with the documentation suggested by @loic84 and also to reflect the fact that this will have to go into 1.7 rather than 1.6.
follow-up: 27 comment:26 by , 12 years ago
Race conditions are conditions that silently accept corruption, so database constraints are the exact opposite of race conditions. They inform the programmer that data integrity was preserved, rather than silently corrupting it. Dealing with those exceptions at that point properly is the right thing, because it prevents having to go through the database much later trying to understand why there are duplicates of some rows but not others, and why there are other more opaque exceptions (like .get() failing), or why relations to one item are actually to several different items. Those are race conditions, but loudly saying "I'm sorry I can't corrupt your data for you" is not a race condition, and I strongly think Django should be choosing the non-race-condition approach with loud and immediate errors rather than the race-condition approach with no errors at the time of corruption, but loud and opaque errors much later.
comment:27 by , 12 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Replying to tunixman:
I strongly think Django should be choosing the non-race-condition approach with loud and immediate errors rather than the race-condition approach with no errors at the time of corruption, but loud and opaque errors much later.
I'm not sure I understand how you would do that. If you are referring to the backend specific upsert, it could be added in the future.
@timo: Left a comment on the PR, other than that I'm happy with the current patch.
comment:28 by , 12 years ago
Can you take me off this ticket then? I'm through and don't need the notifications about it anymore.
comment:29 by , 12 years ago
Triage Stage: | Ready for checkin → Accepted |
---|
I've updated the patch, changing the signature of get_or_create
and update_or_create
to (defaults=None, **kwargs)
. I'd like another person to review the patch before committing it.
comment:30 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Sample implementation of update_or_create.