Opened 5 years ago

Closed 2 years ago

Last modified 18 months ago

#14051 closed New feature (fixed)

Signals for transaction commit/rollback

Reported by: Ask Solem <ask@…> Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: peritus, asksol, brodie, streeter, dtrebbien, anssi.kaariainen@…, glicerinu@…, hv@…, boxed, Kronuz, kmike84@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Some users of django-celery have the problem of publishing references to database state that has not yet been commited.

E.g:

    def add_user(request):
        user = User.objects.create(...)
        # Import the users address book contacts asynchronously using the worker pool.
       import_contacts.delay(user.pk)

The proposed solution is to add a way to delay these calls until the transaction is committed:

    from djcelery import on_transaction_commit

    def add_user(request):
        user = User.objects.create(...)
        on_transaction_commit(import_contacts.delay, user.pk)

I can't see any mechanism to hook into commit/rollback, so it doesn't seem easy
to accomplish.

Do you think it could be possible to add new signals for transaction commit/rollback?

Attachments (1)

transaction_signals.patch (5.0 KB) - added by peritus 5 years ago.
Initial patch (with tests+docs)

Download all attachments as: .zip

Change History (41)

Changed 5 years ago by peritus

Initial patch (with tests+docs)

comment:1 Changed 5 years ago by peritus

  • Has patch set
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I have created an initial patch with tests and docs. Let me know if there is anything that could be improved!

A patch is attached, the changes are also available in git:

http://github.com/pelme/django/tree/transaction-signals

comment:2 Changed 5 years ago by peritus

  • Cc peritus added

comment:3 Changed 5 years ago by Ask Solem <ask@…>

@peritus,

Patch looks good to me.

Do you think the handlers should be able to interrupt commit/rollback by raising an exception,
or should send_robust be used to ensure the action is completed?

comment:4 Changed 5 years ago by Ask Solem <ask@…>

  • Cc asksol added

comment:5 Changed 5 years ago by stavros

+1 for this, it would be really useful in my use case (sending tasks to celery on a post_save signal).

comment:6 Changed 5 years ago by brodie

  • Triage Stage changed from Unreviewed to Design decision needed

comment:7 Changed 5 years ago by brodie

  • Cc brodie added

Some thoughts:

At first glance, the API seems kind of unwieldy. I think it'd be better to either have a set of on-commit signals for models, or to change post_* to only fire when a transaction is successful, and to include some kind of transaction ID in dispatch_uid.

Regarding the example, how would you make the API proposed in the patch work in that case? How would user.pk be passed to the signal handler?

I brought this ticket up to Russell at the Sydney sprint the other day, and he suggested implementing this functionality through a context manager. That might be something else to consider.

comment:8 Changed 4 years ago by julien

  • Severity set to Normal
  • Type set to New feature

comment:9 Changed 4 years ago by streeter

  • Cc streeter added
  • Easy pickings unset
  • UI/UX unset

comment:10 Changed 4 years ago by dtrebbien

  • Cc dtrebbien added

comment:11 Changed 4 years ago by hbf

  • Cc hbf added

comment:12 Changed 4 years ago by hbf

  • Cc hbf removed
  • Version changed from 1.2 to 1.3

IIUC, the OP's original problem could be solved by committing the transaction manually, as described for example here: http://docs.celeryproject.org/en/latest/userguide/tasks.html#database-transactions

However, if you need to start a celery task from a signal (like post_save), you cannot use this method anymore, it seems.

There was a discussion about this also here: http://stackoverflow.com/questions/950214/run-code-after-transaction-commit-in-django

comment:13 Changed 4 years ago by ask@…

Committing the transaction is a wildly different solution, as that may not be what the user wants.

Having the act of applying a task commit the transaction would be a very unexpected side-effect,
and usually if there is an error or another scenario where the transaction should be rolled back,
then you don't want to apply the task either.

comment:14 Changed 3 years ago by aaugustin

  • Resolution set to needsinfo
  • Status changed from new to closed

Isn't the request_finished sufficient for this use case? By the time the request is finished, the transaction must be either committed or rolled back.

Signals aren't free; I'd prefer to avoid multiplying them if possible.

comment:15 Changed 3 years ago by anonymous

request_finished doesn't tell you whether the transaction was committed or not.

You don't want a task to be applied if the transaction was aborted, also the semantics of 'request_finished'
is very different to 'transaction_committed', who knows what can possibly happen in between.

I don't see how signals can pose that much overhead, especially if not connected to anything. If that is the case then maybe it should be optimized
or another method of augmenting functionality could be used.

comment:16 Changed 3 years ago by aaugustin

  • Resolution needsinfo deleted
  • Status changed from closed to reopened

The anonymous has a valid point. Re-opening as DDN.

comment:17 Changed 3 years ago by anonymous

+1 for this. It would be useful for any kind of after-the commit processing that is too large to be done in-band but even more importantly the 'post save' will become much less of a gotcha because with a 'post commit' people will realize that 'post save' does not occur after a finalized transaction. Just knowing that upfront would have saved me a lot of time!

comment:18 Changed 3 years ago by akaariai

  • Cc anssi.kaariainen@… added

If I understand correctly what is wanted is not a traditional pre_commit/post_commit signal. Signals are global. What you would want is pre_commit/post_commit _hooks_. The difference is that the pro_commit/post_commit hooks are a per-transaction thing, not anything global. So, you could add callbacks which should be called when the current transaction commits, like proposed in the original description.

You could of course implement hooks using pre/post commit signals, but I am not sure if it would be better to skip the signals and only implement the hooks.

FWIW I do like the idea of pre_commit/post_commit hooks. However there is a small problem: there can be no guarantee that post_commit things are actually done (crash at inconvenient time). Or that there really is going to be a commit after pre_commit hooks are done. These are unlikely failure conditions, but important to understand for those cases where you absolutely need the post_commit things done.

comment:19 Changed 3 years ago by danny.adair@…

How would you identify the sender(s), given that a transaction can contain changes to all kinds of models?

I'm currently connecting to post_save for sender=MyModel - when MyModel is changed I want to reindex it in the background using celery. Unfortunately, celery tries to do that before the transaction is committed, and indexes the old data (finally I know what's been happening).

I don't see how I could solve this if there was a simple post_commit signal - I need to identify the instance that was saved.
All save()'s and delete()'s would have to collect their instances so that at the end post_commit can be emitted for each one, no?

Regarding manual committing: I have lots of models with a FK and some with m2m to the model I want to index (so my signal handler first figures out which ones are affected, then spawns an indexation task each), and this needs to work in admin. Looks like 100 overrides of save() and delete() would be needed, and I better not miss any...

comment:20 Changed 3 years ago by danny.adair@…

Sorry scratch that previous comment of mine - the existing handlers would just need to be wrapped in a defer() like Dave Hughes is doing: https://github.com/davehughes/django-transaction-signals

comment:21 Changed 3 years ago by glicerinu@…

  • Cc glicerinu@… added

comment:22 Changed 3 years ago by guettli

  • Cc hv@… added

comment:23 Changed 3 years ago by boxed

  • Cc boxed added

comment:24 Changed 3 years ago by Kronuz

  • Cc Kronuz added

If implementing post_commit and post_rollback signals, I think one also has to think about savepoints. pre_rollback signals connected during rolled back savepoints should still be called at commit time, whilst analogously connected post_commit signals should not, potentially ending up with both post_commit and post_rollback signals being called during a transaction.commit() call, but only post_rollback signals during a transaction.rollback() call. Note this behavior is not yet implemented in django-transaction-signals.

comment:25 Changed 3 years ago by nicolas@…

Hi,
As an alternative design, we can take inspiration from transaction package.
http://www.zodb.org/zodbbook/transactions.html#data-managers

DataManagers are exactly what we need. They are useful to spread the transaction calls to every backend that needs to be tight to the current transaction.
By Backend I mean, it can be django.db.connection, The celery broker, or even a simple transaction safe dictionary object.

comment:26 Changed 2 years ago by kmike

  • Cc kmike84@… added

comment:27 Changed 2 years ago by aaugustin

  • Status changed from reopened to new

comment:28 Changed 2 years ago by aaugustin

  • Resolution set to fixed
  • Status changed from new to closed

With the new transaction management introduced in Django 1.6, this should be much less of a problem in some cases, and untractable in other cases.

If you're using Django's default transaction management: you're in autocommit, problem solved. Make sure you aren't in an atomic block before sending the task to celery.

If you're using ATOMIC_REQUESTS, or you're within an atomic block created by code outside of your control, it becomes very hard to determine if and when a given set of changes will actually be committed to the database. Every time you exit an atomic level, some changes may be rolled back, but not necessarily all of them.

Since the problem is solved in the default case, I'm marking this as fixed. It's wontfix for the non-default case, unless someone comes up with a really nice patch!

comment:29 Changed 2 years ago by brodie

Calling this problem intractable is a bit of an overstatement. I think trying to implement this in terms of signals doesn't make a whole lot of sense, but a decorator approach would work quite well. For example:

from django.db import transaction

def view(request):
    thing = Thing()
    thing.save()

    @transaction.post_commit
    def queue_task():
        ThingTask.delay(thing.id)

    return HttpResponse('...')

@post_commit would have the following semantics:

  • When called in a transaction, it queues the function passed to it into a list of post-commit callbacks on the database connection.
  • When called outside of a transaction (in autocommit mode), it executes the function immediately.

After the database connection wrapper commits, it calls each queued post-commit function. After each invocation, it commits/rolls back. If the function raises an exception, it rolls back, logs the exception, and continues executing the remaining post-commit callbacks.

This system would require the Django app to be configured to commit/roll back after every request.

When the database connection is closed, if there are callbacks that haven't been executed, an error would be logged (or perhaps an exception should be raised?)

---

Thoughts? If you're interested, I can provide a patch that implements the above system.

comment:30 Changed 2 years ago by aaugustin

The complexity is in handling partial rollbacks performed with savepoints. You need to keep track of which post_commit tasks must be cancelled at each level.

It's doable but it's going to make the implementation of transactions more complicated than what I'm comfortable with. (It's already a bit more complicated than I'd like, since it handles joining an inner transaction with an outer one.)

comment:31 Changed 2 years ago by asksol

Though, with simple {transaction_start, transaction_commit, transaction_cancel} signals, keeping track of what to call and when would be Celery's problem and not Django's. While it would be nice, it was never my intention that Django would implement this logic.

comment:32 Changed 2 years ago by aaugustin

These signals aren't enough. You also need to track savepoint rollbacks, otherwise you could still attempt to run a task on data that wasn't inserted to the database.

comment:33 Changed 2 years ago by asksol

right, I guess with autocommit an operation may easily consist of multiple transactions. I guess the only way to do this properly is by using manual transaction management.

comment:34 Changed 2 years ago by aaugustin

"Manual transaction management" no longer exists in master. (That's why I closed this ticket.)

For your use case, I recommend staying in autocommit mode (ie. avoid ATOMIC_REQUESTS), and not enqueuing tasks within an atomic block.

comment:35 Changed 2 years ago by asksol

Ok, I see and agree, with the new autocommit behavior this is unlikely to catch users by surprise anymore, and race conditions will be more obvious if you're in an atomic block. Thanks!

comment:36 Changed 2 years ago by kux

I don't actually understand why this got closed.

Even though manual transaction management no longer exists in master, I would still find it very useful to be able to register on_commit and on_rollback hooks for the atomic block. You might be having django models that also depend on external resources. In case of a rollback you might want to run a clean-up action. Don't actually care about savepoints... I just want to make sure I don't leave any junk behind :)

Take for example the admin's change_view. If an exception is being raised while changing an object, the only way I can run any clean-up code is by overriding change_view. This doesn't seem very elegant.

comment:37 Changed 20 months ago by aaugustin

kux, I won't include in Django partial features that vaguely cover some needs of some users. I either do things correctly or not at all.

Transactions guarantee a very high level of reliability, enforced by the database. I refuse to mix them with a less reliable system that could leave data in an inconsistent state — especially signals, which are the most misused feature of Django.

Your use case is easily implemented by adding cleanup tasks to a request.cleanup_on_errors list, and running them with a middleware in process_exception when ATOMIC_REQUESTS triggers a rollback. Besides, this allows you to register specific tasks rather than having to write and register a global signal handler that can cleanup anything left behind by any part of your code.

comment:38 Changed 18 months ago by aaugustin

comment:39 Changed 18 months ago by aaugustin

I've created a new ticket that asks for the ability to run code after the transaction is committed: #21803.

As far as I understand, that was the original use case here.

comment:40 Changed 18 months ago by anonymous

I learned that get_or_create operates in a transaction even when in autocommit mode

Note: See TracTickets for help on using tickets.
Back to Top