Opened 7 years ago
Closed 10 months ago
#28344 closed New feature (fixed)
Add from_queryset parameter to Model.refresh_from_db()
Reported by: | Patryk Zawadzki | Owned by: | Aivars Kalvāns |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Raphael Michel, Semyon Pupkov, Andrey Maslov, Alex Vandiver, Aivars Kalvāns, Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I have a project where we deal with many financial operations that rely on transactions and row-level locking a lot to guard against race conditions. Unfortunately it's a common pattern for a function to accept an instance of model A, start an atomic block, fetch a new instance of the same row with select_for_update()
, do a change, call save()
and then refresh the original instance. It would all be easier if I could just call instance.refresh_from_db(for_update=True)
and it would save a me a DB roundtrip to refresh the original instance passed (Django ORM does not have a session manager so the old object will not otherwise reflect the changes).
Change History (38)
comment:1 by , 7 years ago
Summary: | Add `for_update=False` to `refresh_from_db()` → Add for_update parameter to Model.refresh_from_db() |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
follow-up: 25 comment:3 by , 7 years ago
Version: | 1.11 → master |
---|
Count me -0 on adding this given all the available select_for_update()
options. I fear we might open a can of worms here and have to either say no to nowait
, skip_locked
and now of
support in the future or allow for_update
to be a dict delegating kwargs to the underlying select_for_update()
call.
From my personal experience using select_for_update()
with refresh_from_db()
is not a common enough pattern to warrant the addition of this kwarg. Just me two cents.
comment:4 by , 7 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
The basic implementation is fairly simple, but I also share concerns about full (with all options) support for select_for_update()
with refresh_from_db()
.
comment:5 by , 7 years ago
Would refresh_for_update
make more sense? My case is really common in domains where resources are either shared or have to be protected from race conditions.
comment:6 by , 7 years ago
I've had to deal with similar patterns many times (mostly, when avoiding race conditions is critical -- for example, when closing and charging an order).
However, I think I might prefer an alternative pattern for this; I'd very much prefer if a model instance can lock it's own row. Currently, I've to construct a single queryset like :
# Assume obj is a model instance: MyModel.objects.filter(pk=obj.pk).select_for_update() obj.some_attr = 13 obj.save()
A cleaner pattern for this sort of of thing would probably be something like:
with obj.lock_for_update(): obj.some_attr = 13 # Would this be auto-saved at the end of the CM's block?
comment:7 by , 7 years ago
I don't think a context manager would work as there is no way to unlock a row save for committing the transaction.
Also would self.lock_for_update()
be a refresh_from_db
in disguise or would you prefer for it not to update the instance (potentially overwriting any unsaved changes)?
comment:8 by , 7 years ago
If the transaction is written as I did in my example, you don't need to call refresh_from_db
, since you only ever keep the reference to one loaded instance.
I guess that saving when exiting the context manager would solve any doubts.
comment:9 by , 7 years ago
Hugo, I appreciate your example but in my case I am dealing with financial transactions where I absolutely need to keep a database audit trail. This means I can't just wrap everything in a large transaction and risk it getting rolled back. Instead I have a number of tiny atomic blocks that need to be placed with surgical precision. This means I have to fetch the same object multiple times for it to act as the transaction synchronization point (via SELECT FOR UPDATE
within each atomic block).
Maybe it would be easier to introduce transaction.atomic_with_locked(obj)
that would be the equivalent of:
with transaction.atomic(): Foo.objects.select_for_update.get(pk=foo.pk) # discarding the result is fine as the lock is now in place yield
comment:10 by , 6 years ago
@Patryk that code block you suggest has a bug.
foo = Foo.objects.get(id=1) # loads entire record (call these _old_ values) # Now, what happens here if somebody _else_ writes to foo? with transaction.atomic(): Foo.objects.select_for_update.get(pk=foo.pk) # selects and ignores _new_ values yield # Modifies foo ... but foo.save() will save the _old_ values
I think this goes to show how necessary this feature is. Until it comes, I'll use this workaround:
foo = Foo.objects.get(id=1) with transaction.atomic(): Foo.objects.select_for_update.get(pk=foo.pk) foo.reload_from_db() yield
... which selects the row three times
follow-up: 21 comment:13 by , 6 years ago
I'm investigating this ticket for possible implementation at PyCon UK sprints. Initial thoughts:
I like the pattern suggested above:
with obj.lock_for_update(): obj.some_attr = 13 # Would this be auto-saved at the end of the CM's block?
However, this one implies both a transaction and an auto-save to me. So my proposal would be to "decorate" the lock_for_update()
with three parameters:
atomic: bool=True
auto-save: bool=True
refresh_from_db: bool=True
Would this be an acceptable solution?
comment:15 by , 6 years ago
Cc: | added |
---|
comment:16 by , 6 years ago
Cc: | added |
---|
comment:17 by , 5 years ago
Cc: | added |
---|
comment:18 by , 2 years ago
Sorry for commenting on a ticket update 3 years ago, but any plans on it? Honestly I don't think the trick-around is not a common enough pattern...
comment:19 by , 17 months ago
ActiveRecord has obj.with_lock { do stuff }
which I have used extensively in a variety of applications. I would appreciate something similar in Django.
comment:20 by , 13 months ago
Cc: | added |
---|
comment:21 by , 12 months ago
Replying to Jure Erznožnik:
with obj.lock_for_update(): obj.some_attr = 13 # Would this be auto-saved at the end of the CM's block?However, this one implies both a transaction and an auto-save to me. So my proposal would be to "decorate" the
lock_for_update()
with three parameters:
atomic: bool=True
auto-save: bool=True
refresh_from_db: bool=True
Can you elaborate on what those parameters would do? If I understand refresh_from_db
correctly - it opens the door to the bug mentioned in comment:10.
foo = Foo.objects.filter(...).get() ... Foo.objects.select_for_update.get(pk=foo.pk)
it leads to a situation where the object foo
differs from the now locked database record. That's why I would like to have a single operation to lock the object and refresh the object with the latest "locked" values. I'm all for something like refresh_from_db
or refresh_for_update
to avoid programming errors and reduce the number of database operations at the same time.
That said, I have objections to RoR way of obj.lock_for_update
or obj.with_lock
: it implies that the lock will be released once the block ends, but that is not the case. Once locked, the record will be locked until the main transaction is committed or rolled back. Nested transactions (savepoints) may release locks when rolled back, but it differs between databases: Oracle does not release locks, but Postgres does (IMHO). It looks nice, but it gives the wrong impression of how it works.
comment:22 by , 12 months ago
Cc: | added |
---|
comment:23 by , 12 months ago
with transaction.atomic(): a = Account.objects.get(id=1) with transaction.atomic(): Account.objects.select_for_update().get(id=a.pk) a.refresh_from_db() # The lock is not released here # The lock is released here
For obj.with_lock
to match what is happening in the database we would have to ensure it's called outside transaction but that makes it useless when we have to acquire more than one lock
comment:24 by , 12 months ago
Has patch: | set |
---|
comment:25 by , 12 months ago
Replying to Simon Charette:
I fear we might open a can of worms here and have to either say no to
nowait
,skip_locked
and nowof
support in the future or allowfor_update
to be a dict delegating kwargs to the underlyingselect_for_update()
call.
I think most of those parameters do not apply to operations on the specific model:
of
when multiple models have to be locked, a QuerySet should be the right choice
skip_locked
does not make sense when working with one specific instance of Model (primary key)
nowait
might be interesting - something similar to threading.Lock.acquire(blocking=False)
. But unlike acquire
which returns a boolean, select_for_update
raises DatabaseError
. Raising an exception from refresh_from_db
would suggest the object is in an invalid state. Code that uses the object after the exception would raise eyebrows during code review. Making refresh_from_db
return a boolean indicating if it succeeded or not could also lead to issues because people are not used to checking the result of refresh_from_db
.
So I think the default select_for_update
is the only right solution. Adding nowait
might be an option together with a new dedicated method refresh_for_update
but I haven't seen use for that in my domain.
follow-up: 27 comment:26 by , 12 months ago
Cc: | added |
---|
of when multiple models have to be locked, a QuerySet should be the right choice
What about MTI when a single model involves multiple tables which is something refresh_from_db(fields)
supports? MTI is the main reason of
was added in the first place so isn't incoherent that refresh_from_db(fields)
supports inherited fields but refresh_from_db(for_update)
doesn't?
skip_locked
does not make sense when working with one specific instance of Model (primary key)
That I agree with.
So I think the default select_for_update is the only right solution.
I'm still struggling to see the appeal for adding this option to be honest. The reported use case is to handle calls that pass model instance and want to lock the object for the duration of the transaction. Thing is refresh_from_db
overwrites any attributes that might have been set on the objet passed by reference and results in the creation of a new temporary model instance anyway so what are the benefits over simply doing a queryset lookup with select_for_update
by the primary key and doing a few attribute assignments?
comment:27 by , 12 months ago
Replying to Simon Charette:
So I think the default select_for_update is the only right solution.
I'm still struggling to see the appeal for adding this option to be honest. The reported use case is to handle calls that pass model instance and want to lock the object for the duration of the transaction. Thing is
refresh_from_db
overwrites any attributes that might have been set on the objet passed by reference and results in the creation of a new temporary model instance anyway so what are the benefits over simply doing a queryset lookup withselect_for_update
by the primary key and doing a few attribute assignments?
I think all this applies to refresh_from_db
regardless of locking or not. I don't know what pros and cons were discussed before adding refresh_from_db
. Nothing prevents us from assigning attributes after lookup by the primary key but refresh_from_db
makes it easier when we do not care about previous values.
The locking part gets a bit more annoying in the financial domain because it's a common approach to optimize for concurrency: first reading models with all related models and performing status and conditions checks and locking models just before modifications. I also write entry journals and audit trails before doing updates and rely on rollbacks to clean that up if updates fail.
comment:30 by , 12 months ago
To provide some additional perspective with some real-world use cases:
I agree that there is no use case for skip_locked
here, and for of
only with model inheritance (which I never used). I do see a theoretical use-case for nowait
, but I have never come across wanting it in real code.
I have come across wanting a refresh_for_update
-style thing as discussed here many times, though. As others mentioned, usually in relation with an encapsulated piece of business logic that gets passed an instance and requires locking. Sure, the business logic could just acquire a new version of that instance, but then the code calling the business logic needs to know that the instance it passed into the business logic no longer is a valid version and *also* needs to refresh their instance, which is not an intuitive interface.
In at least some situation, I even built my own little refresh_from_db
within my own code, which certainly is not nice:
So I'm +1 for adding *something* that helps with this, and I'd personally think a new refresh_for_update(using, fields, of, nowait)
function is the clearest and most future-safe design.
(Sorry for first updating the description instead of adding a commend, I have not used trac in a while and used the wrong text box. I reverted it immediately)
follow-up: 32 comment:31 by , 12 months ago
Thanks for chiming in everyone.
So given the main benefits of refresh_from_db
is about the encapsulation of object fetching and attribute assignment and the main concerns are about choosing an interface that isn't too limiting giving the large number of options that QuerySet
has could an acceptable solution be that instead of focusing on select_for_update
we'd allow a from_queryset: QuerySet
to be passed to the method and would be used instead of self.__class__._base_manager
when provided.
That would allow
queryset = Author.objects.select_for_update() author.refresh_from_db(from_queryset=queryset)
while allowing any necessary option to be passed to select_for_update
but also allow refresh_from_db
to respect other criteria such as soft-deletion
queryset = Author.objects.all() # As opposed to `_base_manager` the default manager could be doing `filter(deleted_at=None)` author.refresh_from_db(from_queryset=queryset)
comment:32 by , 12 months ago
Replying to Simon Charette:
Thanks for chiming in everyone.
could an acceptable solution be that instead of focusing on
select_for_update
we'd allow afrom_queryset: QuerySet
to be passed to the method and would be used instead ofself.__class__._base_manager
when provided.
That would do it but I started thinking about what other possibilities it opens. Current parameters using
and fields
can be applied directly to the queryset and the whole function would be refresh_from_queryset
instead of refresh_from_db
.
queryset = Author.objects.using('default').only(*fields) author.refresh_from_queryset(queryset) or author.refresh_from_db(from_queryset=queryset)
comment:33 by , 12 months ago
I don't think it would be a problem for refresh_from_db
to apply fields and using
if explicitly provided with from_queryset
-
django/db/models/base.py
diff --git a/django/db/models/base.py b/django/db/models/base.py index 6eaa600f10..8a9cbd606e 100644
a b def get_deferred_fields(self): 672 672 if f.attname not in self.__dict__ 673 673 } 674 674 675 def refresh_from_db(self, using=None, fields=None ):675 def refresh_from_db(self, using=None, fields=None, from_queryset=None): 676 676 """ 677 677 Reload field values from the database. 678 678 … … def refresh_from_db(self, using=None, fields=None): 704 704 "are not allowed in fields." % LOOKUP_SEP 705 705 ) 706 706 707 hints = {"instance": self} 708 db_instance_qs = self.__class__._base_manager.db_manager( 709 using, hints=hints 710 ).filter(pk=self.pk) 707 if from_queryset is not None: 708 if using is not None: 709 from_queryset = from_queryset.using(using) 710 else: 711 hints = {"instance": self} 712 from_queryset = self.__class__._base_manager.db_manager( 713 using, hints=hints 714 ) 715 db_instance_qs = from_queryset.filter(pk=self.pk) 711 716 712 717 # Use provided fields, if not set then reload all non-deferred fields. 713 718 deferred_fields = self.get_deferred_fields()
That would mean that
refresh_from_db(using='other', from_queryset=Author.objects.using('default')) # would use `other` refresh_from_db(fields=['first_name'], from_queryset=Author.objects.only('last_name')) # would use ['first_name']
Why do you think the addition of a refresh_from_queryset
method is warranted?
comment:34 by , 12 months ago
I'm ok with refresh_from_db
, I was thinking out loud that the possibility of passing a queryset makes existing parameters redundant.
comment:35 by , 12 months ago
Just another idea: using from_queryset
allows us to specify select_related
on the queryset and with a couple of lines in refresh_from_db
we could make it work and avoid extra queries when accessing related fields later.
comment:36 by , 12 months ago
I'm ok with refresh_from_db, I was thinking out loud that the possibility of passing a queryset makes existing parameters redundant.
Makes sense, we do have similar APIs where some flags are redundant with others so I don't think it's a huge deal.
Just another idea: using from_queryset allows us to specify select_related on the queryset and with a couple of lines in refresh_from_db we could make it work and avoid extra queries when accessing related fields later.
yeah it seems like it opens more doors and embrace the facilitation of attribute assignment pattern better.
comment:38 by , 12 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:39 by , 12 months ago
Patch needs improvement: | set |
---|
comment:41 by , 10 months ago
Summary: | Add for_update parameter to Model.refresh_from_db() → Add from_queryset parameter to Model.refresh_from_db() |
---|
comment:42 by , 10 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
Looks okay at first glance. Is the implementation fairly simple?