Opened 12 years ago

Closed 11 years ago

Last modified 11 years ago

#20429 closed New feature (fixed)

Add QuerySet.update_or_create method

Reported by: Evan Cofsky Owned by: Karol Sikora
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: timograham@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

It seems handy to have a function that would update an existing record in the database with the values of defaults if one is found, instead of just returning a flag to the caller like get_or_create does. We have a codebase using this, but it may also be useful to the world at large. The attached implementation makes a passing effort at not updating too much of the model, but should probably actually restrict itself to just fields and related objects instead of any attribute.

Attachments (1)

upsert.py (710 bytes ) - added by Evan Cofsky 12 years ago.
Sample implementation of update_or_create.

Download all attachments as: .zip

Change History (33)

by Evan Cofsky, 12 years ago

Attachment: upsert.py added

Sample implementation of update_or_create.

comment:1 by Russell Keith-Magee, 12 years ago

Needs documentation: set
Needs tests: set
Patch needs improvement: set
Triage Stage: UnreviewedAccepted

Upsert is a fairly common idea; seems worthwhile to me.

Regarding implementation: there's a lot of common ground between get_or_create and update_or_create, so I'm wondering if there should be some common base implementation, rather than layering one over the other.

comment:2 by Karol Sikora, 12 years ago

Owner: changed from nobody to Karol Sikora
Status: newassigned

comment:3 by Evan Cofsky, 12 years ago

I hadn't thought about that. That's a good idea I'd be willing to look into if needed.

comment:5 by Evan Cofsky, 12 years ago

Should we at all be concerned in the update code about limiting updated attributes to just fields? It seems like it could potentially allow some subtle and difficult to identify bugs if any updates are allowed.

comment:6 by Karol Sikora, 12 years ago

Yes, You have right. I'll add checking if updated fields are model attributes.

comment:7 by Karol Sikora, 12 years ago

I've changed approach according to HonzaKral proposition, currently utilizing update method on QuerySet.

comment:8 by Evan Cofsky, 12 years ago

Using update() won't call save on the created or updated object (https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.update), which is slightly different from the get_or_create() API (https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create). The difference may be something we should keep in mind when deciding on the final implementation. My preference is that they behave the same in this regard (i.e. save() is called with both functions).

comment:9 by Karol Sikora, 12 years ago

Yes, i know. I will talk also tomorrow with other developers on sprints about this approach.
Maybe the best way would be to choose through some kwarg to use update or save?
But using update is more consistent looking to name this method: update_or_create. update on first position suggests that its following to update approach.

Last edited 12 years ago by Karol Sikora (previous) (diff)

comment:10 by Evan Cofsky, 12 years ago

Maybe calling it update_or_create is misleading, too, and calling it something like upsert is better. This would avoid the name collision with update on QuerySet, and then we can follow the semantics of get_or_create more closely. But I do think following the semantics of get_or_create are more important than other considerations. Callers will either expect save to have been called after updating to perform application-level consistency checks, or people will need to remember to call save after each update_or_create call, and being safe by default is important to me.

comment:12 by Karol Sikora, 12 years ago

After short talk with @russellm decision is to follow get_or_create approach with calling save on model.
I'll update pull request shortly.

comment:13 by Karol Sikora, 12 years ago

Pull request updated.

comment:14 by Russell Keith-Magee, 12 years ago

Review notes on the pull request.

Also - when submitting a patch that you want reviewed, it helps if you uncheck the "Patch needs improvement" flag. The list of patches needing review is essentially the list with a patch, that doesn't need improvement, and doesn't need tests or docs. If you don't update the flags, your patch will likely end up being missed for review.

comment:15 by Karol Sikora, 12 years ago

Patch needs improvement: unset

comment:16 by Karol Sikora, 12 years ago

@russelm, waitng for Your second review, i've pushed new commit with updates according Your sugessions(and also uncheck "Patch needs improvement" flag).

comment:17 by wim@…, 12 years ago

Patch needs improvement: set

Is the current patch still at: https://github.com/django/django/pull/1132/files ?

Probably by accident documentation of the first and last method have been removed.

comment:18 by Tim Graham, 12 years ago

Cc: timograham@… added
Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset
Summary: Implementation of update_or_createAdd QuerySet.update_or_create method

Updated pull request to address outstanding issues: https://github.com/django/django/pull/1248

comment:19 by anonymous, 12 years ago

Patch needs improvement: set

Some backends like mysql can handle upsert natively, update_or_create should use that as advantage.

comment:20 by Tim Graham, 12 years ago

I imagine that would increase the complexity of the patch quite a bit. I wonder if we need that optimization for the first version of this? Is the anonymous commenter willing to implement it?

comment:21 by Tim Graham, 12 years ago

Patch needs improvement: unset

As suggested in IRC, I used djangobench to test this change, but I didn't get any conclusive results (this is my first time using djangobench, so I'm not certain how to interpret the results). Sometimes djangobench report a "significant" difference (both slower and faster), other times there was no difference. Here are a couple results. I'm assuming the difference is negligible enough to ignore.

$ djangobench --control=master --experiment=ticket_20429 query_get_or_create -t 500

Running 'query_get_or_create' benchmark ...
Min: 0.000931 -> 0.000944: 1.0141x slower
Avg: 0.000997 -> 0.000979: 1.0187x faster
Significant (t=3.474695)
Stddev: 0.00011 -> 0.00005: 2.1908x smaller (N = 500)

Running 'query_get_or_create' benchmark ...
Min: 0.000935 -> 0.000947: 1.0130x slower
Avg: 0.000975 -> 0.001007: 1.0335x slower
Significant (t=-7.464501)
Stddev: 0.00006 -> 0.00008: 1.3357x larger (N = 500)

Running 'query_get_or_create' benchmark ...
Min: 0.000931 -> 0.000939: 1.0087x slower
Avg: 0.000999 -> 0.001006: 1.0076x slower
Not significant
Stddev: 0.00014 -> 0.00013: 1.0402x smaller (N = 500)

comment:22 by loic84, 12 years ago

As mentioned on IRC, I believe this method suffers from a race-condition (especially if used in a view) which should be documented.

Warning:

This method is prone to race-condition and can result in multiple rows being
inserted simultaneously if uniqueness is not enforced at the database level
(see either ``unique`` or ``unique_together``).

During a race where uniqueness is inforced, all simultaneous calls to
``create_or_update`` but one will raise an ``IntegrityError``.

Without uniqueness inforcement, any ``create_or_update`` call with the same set
of parameters that happens after a race will raise ``MultipleObjectsReturned``.

Like get_or_create, it might be good to recommend it as a convenience for scripts rather than something to use in a view.

Refs #12579

comment:23 by Evan Cofsky, 12 years ago

If you're not enforcing uniqueness constraints in the database you're going to have this race condition no matter what. Perhaps the documentation around creating models and the basics of the tutorials could be updated to reflect this. Documenting it as something endemic only to these methods does nothing to improve the data consistency of Django applications and would rather just discourage people from using these methods which are basically the codification of the proper way to handle these cases along side proper database implementation.

comment:24 by loic84, 12 years ago

The integrity issue is just a side effect of the race-condition, even if uniqueness is enforced at the database level, you still need to be aware of the potential 500 you will be getting.

I agree that any get() followed by create() in user code suffers from the same issue, but when you do it manually you have a chance to think about the potential problems. There is a layer of opacity when you use a queryset method like update_or_create() and you may assume that all problems have been solved by the framework. Eventually you get to shoot yourself in the foot, which results in tickets like #12579.

update() documents that it's free of race-condition, select_for_update() is obviously free of race-condition on supported backends, it wouldn't be too hard to assume that update_or_create() provides such guarantee as well.

I agree that the race-condition for these common patterns should be documented (if that's not already the case), but convenience methods like get_or_create() and update_or_create() also need to be clear about their implementation and potential consequences.

comment:25 by Tim Graham, 11 years ago

I've updated the patch with the documentation suggested by @loic84 and also to reflect the fact that this will have to go into 1.7 rather than 1.6.

comment:26 by Evan Cofsky, 11 years ago

Race conditions are conditions that silently accept corruption, so database constraints are the exact opposite of race conditions. They inform the programmer that data integrity was preserved, rather than silently corrupting it. Dealing with those exceptions at that point properly is the right thing, because it prevents having to go through the database much later trying to understand why there are duplicates of some rows but not others, and why there are other more opaque exceptions (like .get() failing), or why relations to one item are actually to several different items. Those are race conditions, but loudly saying "I'm sorry I can't corrupt your data for you" is not a race condition, and I strongly think Django should be choosing the non-race-condition approach with loud and immediate errors rather than the race-condition approach with no errors at the time of corruption, but loud and opaque errors much later.

in reply to:  26 comment:27 by loic84, 11 years ago

Triage Stage: AcceptedReady for checkin

Replying to tunixman:

I strongly think Django should be choosing the non-race-condition approach with loud and immediate errors rather than the race-condition approach with no errors at the time of corruption, but loud and opaque errors much later.

I'm not sure I understand how you would do that. If you are referring to the backend specific upsert, it could be added in the future.

@timo: Left a comment on the PR, other than that I'm happy with the current patch.

comment:28 by Evan Cofsky, 11 years ago

Can you take me off this ticket then? I'm through and don't need the notifications about it anymore.

comment:29 by Tim Graham, 11 years ago

Triage Stage: Ready for checkinAccepted

I've updated the patch, changing the signature of get_or_create and update_or_create to (defaults=None, **kwargs). I'd like another person to review the patch before committing it.

comment:30 by Tim Graham <timograham@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In 6272d2f155adb4f32ef129d57e9eb5493ebde6ed:

Fixed #20429 -- Added QuerySet.update_or_create

Thanks tunixman for the suggestion and Loic Bistuer for the review.

comment:31 by Tim Graham <timograham@…>, 11 years ago

In f7290581fe2106c08d97215ab93e27cf6b27e100:

Fixed a regression with get_or_create and virtual fields.

refs #20429

Thanks Simon Charette for the report and review.

comment:32 by Andrew Godwin <andrew@…>, 11 years ago

In 3f416f637918cc162877be95a59d50825b203089:

Fixed a regression with get_or_create and virtual fields.

refs #20429

Thanks Simon Charette for the report and review.

Note: See TracTickets for help on using tickets.
Back to Top