#14051 closed New feature (fixed)
Signals for transaction commit/rollback
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | Andreas Pelme, asksol, Brodie Rao, Chris Streeter, dtrebbien, anssi.kaariainen@…, glicerinu@…, hv@…, Anders Hovmöller, German M. Bravo, kmike84@… | Triage Stage: | Design decision needed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Some users of django-celery have the problem of publishing references to database state that has not yet been commited.
E.g:
def add_user(request): user = User.objects.create(...) # Import the users address book contacts asynchronously using the worker pool. import_contacts.delay(user.pk)
The proposed solution is to add a way to delay these calls until the transaction is committed:
from djcelery import on_transaction_commit def add_user(request): user = User.objects.create(...) on_transaction_commit(import_contacts.delay, user.pk)
I can't see any mechanism to hook into commit/rollback, so it doesn't seem easy
to accomplish.
Do you think it could be possible to add new signals for transaction commit/rollback?
Attachments (1)
Change History (41)
by , 14 years ago
Attachment: | transaction_signals.patch added |
---|
comment:1 by , 14 years ago
Has patch: | set |
---|
I have created an initial patch with tests and docs. Let me know if there is anything that could be improved!
A patch is attached, the changes are also available in git:
comment:2 by , 14 years ago
Cc: | added |
---|
comment:3 by , 14 years ago
@peritus,
Patch looks good to me.
Do you think the handlers should be able to interrupt commit/rollback by raising an exception,
or should send_robust
be used to ensure the action is completed?
comment:4 by , 14 years ago
Cc: | added |
---|
comment:5 by , 14 years ago
+1 for this, it would be really useful in my use case (sending tasks to celery on a post_save signal).
comment:6 by , 14 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:7 by , 14 years ago
Cc: | added |
---|
Some thoughts:
At first glance, the API seems kind of unwieldy. I think it'd be better to either have a set of on-commit signals for models, or to change post_* to only fire when a transaction is successful, and to include some kind of transaction ID in dispatch_uid.
Regarding the example, how would you make the API proposed in the patch work in that case? How would user.pk be passed to the signal handler?
I brought this ticket up to Russell at the Sydney sprint the other day, and he suggested implementing this functionality through a context manager. That might be something else to consider.
comment:8 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → New feature |
comment:9 by , 13 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
UI/UX: | unset |
comment:10 by , 13 years ago
Cc: | added |
---|
comment:11 by , 13 years ago
Cc: | added |
---|
comment:12 by , 13 years ago
Cc: | removed |
---|---|
Version: | 1.2 → 1.3 |
IIUC, the OP's original problem could be solved by committing the transaction manually, as described for example here: http://docs.celeryproject.org/en/latest/userguide/tasks.html#database-transactions
However, if you need to start a celery task from a signal (like post_save), you cannot use this method anymore, it seems.
There was a discussion about this also here: http://stackoverflow.com/questions/950214/run-code-after-transaction-commit-in-django
comment:13 by , 13 years ago
Committing the transaction is a wildly different solution, as that may not be what the user wants.
Having the act of applying a task commit the transaction would be a very unexpected side-effect,
and usually if there is an error or another scenario where the transaction should be rolled back,
then you don't want to apply the task either.
comment:14 by , 13 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Isn't the request_finished
sufficient for this use case? By the time the request is finished, the transaction must be either committed or rolled back.
Signals aren't free; I'd prefer to avoid multiplying them if possible.
comment:15 by , 13 years ago
request_finished doesn't tell you whether the transaction was committed or not.
You don't want a task to be applied if the transaction was aborted, also the semantics of 'request_finished'
is very different to 'transaction_committed', who knows what can possibly happen in between.
I don't see how signals can pose that much overhead, especially if not connected to anything. If that is the case then maybe it should be optimized
or another method of augmenting functionality could be used.
comment:16 by , 13 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → reopened |
The anonymous has a valid point. Re-opening as DDN.
comment:17 by , 13 years ago
+1 for this. It would be useful for any kind of after-the commit processing that is too large to be done in-band but even more importantly the 'post save' will become much less of a gotcha because with a 'post commit' people will realize that 'post save' does not occur after a finalized transaction. Just knowing that upfront would have saved me a lot of time!
comment:18 by , 13 years ago
Cc: | added |
---|
If I understand correctly what is wanted is not a traditional pre_commit/post_commit signal. Signals are global. What you would want is pre_commit/post_commit _hooks_. The difference is that the pro_commit/post_commit hooks are a per-transaction thing, not anything global. So, you could add callbacks which should be called when the current transaction commits, like proposed in the original description.
You could of course implement hooks using pre/post commit signals, but I am not sure if it would be better to skip the signals and only implement the hooks.
FWIW I do like the idea of pre_commit/post_commit hooks. However there is a small problem: there can be no guarantee that post_commit things are actually done (crash at inconvenient time). Or that there really is going to be a commit after pre_commit hooks are done. These are unlikely failure conditions, but important to understand for those cases where you absolutely need the post_commit things done.
comment:19 by , 13 years ago
How would you identify the sender(s), given that a transaction can contain changes to all kinds of models?
I'm currently connecting to post_save for sender=MyModel - when MyModel is changed I want to reindex it in the background using celery. Unfortunately, celery tries to do that before the transaction is committed, and indexes the old data (finally I know what's been happening).
I don't see how I could solve this if there was a simple post_commit signal - I need to identify the instance that was saved.
All save()'s and delete()'s would have to collect their instances so that at the end post_commit can be emitted for each one, no?
Regarding manual committing: I have lots of models with a FK and some with m2m to the model I want to index (so my signal handler first figures out which ones are affected, then spawns an indexation task each), and this needs to work in admin. Looks like 100 overrides of save() and delete() would be needed, and I better not miss any...
comment:20 by , 13 years ago
Sorry scratch that previous comment of mine - the existing handlers would just need to be wrapped in a defer() like Dave Hughes is doing: https://github.com/davehughes/django-transaction-signals
comment:21 by , 13 years ago
Cc: | added |
---|
comment:22 by , 13 years ago
Cc: | added |
---|
comment:23 by , 12 years ago
Cc: | added |
---|
comment:24 by , 12 years ago
Cc: | added |
---|
If implementing post_commit
and post_rollback
signals, I think one also has to think about savepoints. pre_rollback
signals connected during rolled back savepoints should still be called at commit time, whilst analogously connected post_commit
signals should not, potentially ending up with both post_commit
and post_rollback
signals being called during a transaction.commit()
call, but only post_rollback
signals during a transaction.rollback()
call. Note this behavior is not yet implemented in django-transaction-signals.
comment:25 by , 12 years ago
Hi,
As an alternative design, we can take inspiration from transaction package.
http://www.zodb.org/zodbbook/transactions.html#data-managers
DataManagers are exactly what we need. They are useful to spread the transaction calls to every backend that needs to be tight to the current transaction.
By Backend I mean, it can be django.db.connection, The celery broker, or even a simple transaction safe dictionary object.
comment:26 by , 12 years ago
Cc: | added |
---|
comment:27 by , 12 years ago
Status: | reopened → new |
---|
comment:28 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
With the new transaction management introduced in Django 1.6, this should be much less of a problem in some cases, and untractable in other cases.
If you're using Django's default transaction management: you're in autocommit, problem solved. Make sure you aren't in an atomic block before sending the task to celery.
If you're using ATOMIC_REQUESTS, or you're within an atomic block created by code outside of your control, it becomes very hard to determine if and when a given set of changes will actually be committed to the database. Every time you exit an atomic level, some changes may be rolled back, but not necessarily all of them.
Since the problem is solved in the default case, I'm marking this as fixed. It's wontfix for the non-default case, unless someone comes up with a really nice patch!
comment:29 by , 12 years ago
Calling this problem intractable is a bit of an overstatement. I think trying to implement this in terms of signals doesn't make a whole lot of sense, but a decorator approach would work quite well. For example:
from django.db import transaction def view(request): thing = Thing() thing.save() @transaction.post_commit def queue_task(): ThingTask.delay(thing.id) return HttpResponse('...')
@post_commit would have the following semantics:
- When called in a transaction, it queues the function passed to it into a list of post-commit callbacks on the database connection.
- When called outside of a transaction (in autocommit mode), it executes the function immediately.
After the database connection wrapper commits, it calls each queued post-commit function. After each invocation, it commits/rolls back. If the function raises an exception, it rolls back, logs the exception, and continues executing the remaining post-commit callbacks.
This system would require the Django app to be configured to commit/roll back after every request.
When the database connection is closed, if there are callbacks that haven't been executed, an error would be logged (or perhaps an exception should be raised?)
---
Thoughts? If you're interested, I can provide a patch that implements the above system.
comment:30 by , 12 years ago
The complexity is in handling partial rollbacks performed with savepoints. You need to keep track of which post_commit tasks must be cancelled at each level.
It's doable but it's going to make the implementation of transactions more complicated than what I'm comfortable with. (It's already a bit more complicated than I'd like, since it handles joining an inner transaction with an outer one.)
comment:31 by , 12 years ago
Though, with simple {transaction_start, transaction_commit, transaction_cancel} signals, keeping track of what to call and when would be Celery's problem and not Django's. While it would be nice, it was never my intention that Django would implement this logic.
comment:32 by , 12 years ago
These signals aren't enough. You also need to track savepoint rollbacks, otherwise you could still attempt to run a task on data that wasn't inserted to the database.
comment:33 by , 12 years ago
right, I guess with autocommit an operation may easily consist of multiple transactions. I guess the only way to do this properly is by using manual transaction management.
comment:34 by , 12 years ago
"Manual transaction management" no longer exists in master. (That's why I closed this ticket.)
For your use case, I recommend staying in autocommit mode (ie. avoid ATOMIC_REQUESTS), and not enqueuing tasks within an atomic block.
comment:35 by , 12 years ago
Ok, I see and agree, with the new autocommit behavior this is unlikely to catch users by surprise anymore, and race conditions will be more obvious if you're in an atomic block. Thanks!
comment:36 by , 12 years ago
I don't actually understand why this got closed.
Even though manual transaction management no longer exists in master, I would still find it very useful to be able to register on_commit and on_rollback hooks for the atomic block. You might be having django models that also depend on external resources. In case of a rollback you might want to run a clean-up action. Don't actually care about savepoints... I just want to make sure I don't leave any junk behind :)
Take for example the admin's change_view. If an exception is being raised while changing an object, the only way I can run any clean-up code is by overriding change_view. This doesn't seem very elegant.
comment:37 by , 11 years ago
kux, I won't include in Django partial features that vaguely cover some needs of some users. I either do things correctly or not at all.
Transactions guarantee a very high level of reliability, enforced by the database. I refuse to mix them with a less reliable system that could leave data in an inconsistent state — especially signals, which are the most misused feature of Django.
Your use case is easily implemented by adding cleanup tasks to a request.cleanup_on_errors list, and running them with a middleware in process_exception when ATOMIC_REQUESTS triggers a rollback. Besides, this allows you to register specific tasks rather than having to write and register a global signal handler that can cleanup anything left behind by any part of your code.
comment:38 by , 11 years ago
I explained the wontfix in greater detail:
- on the mailing list: https://groups.google.com/d/msg/django-developers/Y0CReXWzHcI/qNXw06Xb7yQJ
- in this project: https://github.com/aaugustin/django-transaction-signals
comment:39 by , 11 years ago
I've created a new ticket that asks for the ability to run code after the transaction is committed: #21803.
As far as I understand, that was the original use case here.
comment:40 by , 11 years ago
I learned that get_or_create operates in a transaction even when in autocommit mode
Initial patch (with tests+docs)