Django

Code

Ticket #9964 (assigned)

Opened 1 year ago

Last modified 2 weeks ago

Transaction middleware closes the transaction only when it's marked as dirty

Reported by: ishirav Assigned to: mtredinnick (accepted)
Milestone: 1.3 Component: Database layer (models, ORM)
Version: 1.0-beta-1 Keywords: transactions
Cc: jdunck@gmail.com, hv@tbz-pariv.de, carljm, basti, shai, forest Triage Stage: Accepted
Has patch: 1 Needs documentation: 1
Needs tests: 0 Patch needs improvement: 0

Description

The transaction middleware commits only dirty transactions on success, and rolls back only dirty transactions in case of an error.

This means that any transaction that only SELECTs data gets rolled back (I'm not sure by whom - the database itself, perhaps?). So this is what I see in my PostgreSQL log:

BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SET TIME ZONE E'America/Chicago'
SELECT ...
SELECT ...
SELECT ...
ROLLBACK

The expected behavior is a COMMIT (except in case of an error).

Typically this does not cause any problems, because it affects only non-dirty transactions.

However if you use raw SQL to make database modifications, it will be rolled back unless the dirty flag gets set by some another part of your code. This can result in loss of data, and a very hard to find bug...

The correct behavior would be to always close the transaction, regardless of the dirty flag. Meaning that TransactionMiddleware?.process_response should always call transaction.commit(), and TransactionMiddleware?.process_exception should always call transaction.rollback()

This bug is also the cause of tickets #9763 and #9919

Attachments

bench.py (2.1 kB) - added by shai on 03/08/09 17:51:12.
rudimentary benchmarking code
9964-against-10052.patch (27.7 kB) - added by shai on 03/13/09 23:27:15.
not-yet-ready patch for comments
9964-against-10087.diff (35.3 kB) - added by shai on 03/18/09 08:56:52.
Improved patch, tested (tests included)

Change History

01/05/09 08:10:24 changed by shai

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

I would like to point out two other cases where the current behavior has bad effects:

1) When data retrieval statements have side-effects. This can happen with database back-ends which support procedures (or functions) returning tabular results.

2) When the transaction isolation level is Repeatable-Read or Serializable, and the connections are pooled.

  • As long as no update is made on the specific thread, the transaction isn't closed;
  • so new HTTP requests handled by the same thread would re-use the connection and therefore, the transaction;
  • so the new requests retrieve stale data -- updated to the time when the first request was processed, because of the isolation level.

The code in SVN still contains this bug.

01/05/09 18:08:10 changed by mtredinnick

  • owner changed from nobody to mtredinnick.
  • status changed from new to assigned.

02/26/09 19:29:37 changed by jacob

  • stage changed from Unreviewed to Accepted.
  • milestone set to 1.1.

#9673 and #9919 are duplicates.

03/06/09 09:52:58 changed by jdunck

  • cc set to jdunck@gmail.com.

03/06/09 10:22:04 changed by jdunck

The obvious question is when set_dirty should be called; the description of this ticket suggests that a transaction should be dirty even for a transaction consisting only of selects. The problem with this is that lots of transaction cycles are only selects, and committing a transaction has overhead. I think select-only transactions should rollback by default. If you're using triggers that munge data upon select, set_dirty should be manually called; this can be clearly documented.

There's some overhead and fiddliness with db.connection.cursor.execute looking at the string it receives in order to determine CUD operations. Maybe backends.DatabaseOperations? should have a .is_select(sql) hook?

Otherwise it'd be a decent compromise to just clearly document more that only dirty transactions are committed, and that using cursor.execute does not mark a transaction as dirty.

03/08/09 17:51:12 changed by shai

  • attachment bench.py added.

rudimentary benchmarking code

03/08/09 19:10:05 changed by shai

I suspect jdunck's description is inaccurate, and it is not committing a transaction that has an overhead, but closing the transaction. That is, for select-only transactions, whether you commit or roll back, it costs about the same; the "cheap" option is to leave the transaction be.

I ran some very basic tests on my desktop, to evaluate the costs of closing transactions. The bench.py file I just attached has functions to create connections into SQLite, MySQL, and PostgreSQL databases I had set up; to create a simple table; to fill it up with 100,000 rows; to run some (<100) selects as warm-up; and to run a big bunch of selects -- about 1450 batches of 3, each batch ending with a "transaction ending" operation. I tested for the three databases, against the three operations: no-op, rollback and commit. The results, in milliseconds, were:

Database PostgreSQL MySQL SQLite
Module psycopg2 MySQLdb sqlite3
Leave open 455-576 356-366 309-402
Rollback 569-750 395-414 297-359
Commit 573-717 397-426 298-305

These results support my suspicion, that rollback does not save much (if any) overhead, compared to commit, in select-only transactions.

Assuming this evidence is accepted, then, the only question that remains is whether Django should always close transactions. The arguments I am aware of are:

  • Leaving dangling transactions is "impolite", and in some cases can lead to data loss (lost commits as well as bad commits)
  • Closing every transaction costs. The table above shows that, for select-only transactions, this would cost about 10% performance on MySQL and about 20% on PostgreSQL (see note below for SQLite, and I have no access to Oracle or other common DBMSs at the moment). These numbers are, of course, expected to change wildly with the introduction of multi-threading, different transaction isolation levels, and realistic workloads. However, one should note that the single-threaded case tends to hide the benefits of shorter transactions; I expect the 'commit' option to look better in realistic conditions.

Notes:

  1. After preparing the databases with the functions in the file, I ran all my tests with
        $ python -m timeit -v -n 10 -r 10 -s 'from bench import *' -s 'conn=sqlite()' -s 'warm_up(conn)' 'leave_open(conn)'
    
    Replacing the connection function and the benchmarking function, of course. All the databses use the system default options; the system is 32-bit Debian sid running on a quad-core 2.4GHz cpu, with 2GHz memory, a consumer-grade 250G hard-disk, and 3 KDE sessions up (though only one is active).
  2. The SQLite results seem to indicate that having long transactions increases the performance variance; I found this surprising, and repeated the tests several times with similar results, though this is certainly not rigorous.

03/09/09 10:29:10 changed by guettli

  • cc changed from jdunck@gmail.com to jdunck@gmail.com, hv@tbz-pariv.de.

03/09/09 16:21:25 changed by carljm

  • cc changed from jdunck@gmail.com, hv@tbz-pariv.de to jdunck@gmail.com, hv@tbz-pariv.de, carljm.

03/13/09 23:27:15 changed by shai

  • attachment 9964-against-10052.patch added.

not-yet-ready patch for comments

03/13/09 23:35:41 changed by shai

  • needs_better_patch set to 1.
  • has_patch set to 1.
  • needs_tests set to 1.
  • needs_docs set to 1.

The patch I just attached is not ready for inclusion and not tested, I just want to get some feedback on the ideas:

  • Take the functions affected by 'dirty' out to separate modules, each providing a different version of them
    • fast_select.py reflects current behavior
    • safe.py always closes transactions
  • Selection between modules is done by settings.py
  • The default is fast_select (existing behavior), for backward compatibility
  • The project template is modified to select 'safe', so new projects get that by default
    • But if they need the extra performance, and are willing to pay their dues in code diligence, they can

03/18/09 08:56:52 changed by shai

  • attachment 9964-against-10087.diff added.

Improved patch, tested (tests included)

03/18/09 09:05:16 changed by shai

  • needs_better_patch deleted.
  • needs_tests deleted.

The new patch includes tests, which it passed.

Malcolm has expressed, in the mailing list, preference for a trivial patch; however, the trivial patches considered took it upon Django to select for the users, whether they should take a performance hit (compared to 1.0), or check their code better. The attached patch leaves this choice to the users (as detailed above).

I removed the "need tests" and "need better patch" flags, because it is my opinion now that the tests and patch are good, and viable for looking into. I do not presume to take the role of a committer; I hope this is the right way to do it.

I have to add a word on testing here, but I must run to get some kids from school now.

03/18/09 15:42:27 changed by shai

Two words on testing:

A) The tests introduced are not for the transaction-policy setting per se, but rather for the transaction middleware. It had none, so I figured it could use one while I'm at it.

B) Testing this patch has to be done with two different settings files; one where the Transaction Policy was set to safe, and one where it was left as fast_select. This is because of a minor technicality(1) and a major effect.

The minor technicality doesn't really matter, because of the major effect: When you change something as fundamental as transactional behavior, it's not enough to check your own puny unit-tests; you need to verify that you haven't broken some innocent victim elsewhere. So, I had to run the whole test suite under the two policies anyway; indeed, I had to fix one of the tests(2) for it to pass.

This may look like I'm now proposing that, for all eternity, tests must be thus run twice. Not so; I hope to see, within a couple of versions, that fast_select is recommended against, then the default changed to safe, and fast_select relegated to a status of not-really-supported, much like the TRANSACTIONED_MANAGED knob is today (turning that undocumented knob in your settings file makes the test suite fail completely, as it cannot even syncdb a test database).

(1) The minor technicality is that the way the code is currently written, it modifies the transaction module's global namespace (for speed); this means it does not support changing the Transaction Policy on the fly. In practice, one does not expect this sort of setting to change in mid-operation; in a test setting, it seems to get in the way, but, as mentioned above, it doesn't really matter.

(2) That test I had to fix reveals one weak point of the 'safe' transaction policy -- if all transactions are considered dirty at all times, there's no way to know if a transaction is pending. Thus, we lose a validity check we currently have (where if a transaction is left pending, an exception is thrown). I added a rollback to the end of all transactions, as a minimal protection against such strays.

03/18/09 18:36:02 changed by mtredinnick

Impressive amount of work and analysis here. Thanks.

It'll take a couple of days to review this, although I'll start today. Just because it's a huge change that could Break Stuff(tm) if we get it wrong. But I like the approach you've taken, for the most part.

04/08/09 15:57:28 changed by jdunck

Malcolm, I know there're other things on your plate, but given your estimate here, I thought it might be good to poke this-- did you get a chance to look at it?

05/20/09 03:25:18 changed by basti

  • cc changed from jdunck@gmail.com, hv@tbz-pariv.de, carljm to jdunck@gmail.com, hv@tbz-pariv.de, carljm, basti.

(follow-up: ↓ 16 ) 06/09/09 08:34:46 changed by russellm

  • milestone changed from 1.1 to 1.2.

After discussing with Jacob and Malcolm, we've decided to defer this one from v1.1.

From a purely code perspective, the patches on this ticket aren't ready for trunk. In particular, having a user-space setting to determine transaction behavior is a non-starter.

From a design perspective, there are some subtle tradeoffs that need to be considered here, including weighing the cost/benefit of "always commit", and ensuring that manual transaction control will continue to work for those that need every last nanosecond of performance.

On the flipside - the existing behavior works, as long as you understand what "works" means. This existing behavior could certainly be better, though; in the 1.2 timeframe we can give this the serious discussion it requires.

(in reply to: ↑ 15 ; follow-up: ↓ 17 ) 06/10/09 12:10:49 changed by shai

Hi,

Replying to russellm:

After discussing with Jacob and Malcolm, we've decided to defer this one from v1.1.

I understand how busy you all are; still, I hope you can find the time to respond while the topic is fresh in your heads.

having a user-space setting to determine transaction behavior is a non-starter.

Could you please elaborate on this a little? I find it confusing, as the setting activating the transaction middleware (as all middlewares) is in the same space as the setting proposed in the patch.

From a design perspective, there are some subtle tradeoffs that need to be considered here, including weighing the cost/benefit of "always commit", and ensuring that manual transaction control will continue to work for those that need every last nanosecond of performance.

This I also find confusing, as almost everything that is non-trivial here is intended to keep the current behavior available for people who need it (and in fact, keep it through the upgrade for existing projects). Perhaps I misunderstand, but it seems like you are trying to decide on the always-close-transaction issue, when the patch jumps through hoops in order to defer this very decision -- to users for now, and to the future in general.

On the flipside - the existing behavior works, as long as you understand what "works" means.

In this case, would you consider documentation patches (as proposed in #9919)?

Thanks,

Shai.

(in reply to: ↑ 16 ) 06/13/09 00:49:28 changed by russellm

Replying to shai:

Hi,

having a user-space setting to determine transaction behavior is a non-starter.

Could you please elaborate on this a little? I find it confusing, as the setting activating the transaction middleware (as all middlewares) is in the same space as the setting proposed in the patch.

The current patch proposes to preserve the current behavior, as well as adding a new behavior that implements a new behavior. However, this isn't an area of API where the end user should be required to make a choice at all - it should Just Work (tm). The only users that will care about the different behavior are those optimizing for uber-speed, and for those users we have extensive manual controls over transactions.

From a design perspective, there are some subtle tradeoffs that need to be considered here, including weighing the cost/benefit of "always commit", and ensuring that manual transaction control will continue to work for those that need every last nanosecond of performance.

This I also find confusing, as almost everything that is non-trivial here is intended to keep the current behavior available for people who need it (and in fact, keep it through the upgrade for existing projects). Perhaps I misunderstand, but it seems like you are trying to decide on the always-close-transaction issue, when the patch jumps through hoops in order to defer this very decision -- to users for now, and to the future in general.

Deciding to push the decision to the user isn't a decision at all. More control knobs doesn't necessarily imply better software. Part of the job of a framework is to eliminate some decisions to make sure that things always work as expected. We want to make a decision, and make sure it is the right one. Since this is a bit of an edge case, and there is a reasonable workaround, we're willing to defer that decision until we know we will get it right.

On the flipside - the existing behavior works, as long as you understand what "works" means.

In this case, would you consider documentation patches (as proposed in #9919)?

Agreed, we probably should add some docs for this. The patch on #9919 doesn't really do the job at the moment. To my mind, it makes a lot of unnecessary changes, and only gives a marginal improvement on the one area that the ticket should be addressing. I'll reopen the ticket, but a new patch will be required.

06/14/09 05:13:39 changed by shai

  • cc changed from jdunck@gmail.com, hv@tbz-pariv.de, carljm, basti to jdunck@gmail.com, hv@tbz-pariv.de, carljm, basti, shai.

Replying to russellm:

The current patch proposes to preserve the current behavior, as well as adding a new behavior that implements a new behavior. However, this isn't an area of API where the end user should be required to make a choice at all - it should Just Work (tm). The only users that will care about the different behavior are those optimizing for uber-speed, and for those users we have extensive manual controls over transactions.

I agree, and I've set the defaults accordingly. The only reason I made the choice user-visible was because I could see no other way to make it also backwards-compatible -- to avoid changing the behavior under the feet of existing projects. This is the sort of change that is bound to break some user code; in this case, you essentially argue that the user should fix their code or avoid upgrading, and I was trying to offer a smoother path.

Replying to shai:

This I also find confusing, as almost everything that is non-trivial here is intended to keep the current behavior available for people who need it (and in fact, keep it through the upgrade for existing projects). Perhaps I misunderstand, but it seems like you are trying to decide on the always-close-transaction issue, when the patch jumps through hoops in order to defer this very decision -- to users for now, and to the future in general.

Deciding to push the decision to the user isn't a decision at all. (...) We want to make a decision, and make sure it is the right one.

I mis-represented my intentions. There is a difference between "push the decision to the user" and "make a decision but allow the user to override"; I was trying to do the second. The decision I made was "no change for existing projects, always commit for new ones". Allowing override was, indeed, intended to limit the costs of a "wrong" decision (I am pretty sure I made the right choice in general, but that might not be right for some users).

Further, I believe these arguments will continue to hold after a more thorough revision. We will still want backward-compatibility, and we will still want the decision to be overridable. Granted, we might find better ways to achieve these goals; granted, it makes sense to become more confident of the decision; and granted, it can all wait to 1.2.

In this case, would you consider documentation patches (as proposed in #9919)?

Agreed, we probably should add some docs for this. (...) I'll reopen the ticket, but a new patch will be required.

I'll try to provide a patch in the next few days, if nobody beats me to it :-)

Thanks,

Shai.

02/02/10 19:13:16 changed by forest

  • cc changed from jdunck@gmail.com, hv@tbz-pariv.de, carljm, basti, shai to jdunck@gmail.com, hv@tbz-pariv.de, carljm, basti, shai, forest.

03/03/10 16:44:33 changed by ubernostrum

  • milestone changed from 1.2 to 1.3.

A patch which wasn't ready for 1.1 and hasn't been updated since probably isn't ready for 1.2.

(also, a better solution might just be to document that if you're doing things in raw SQL you should manually inform the transaction middleware by setting the transaction to dirty)

03/04/10 07:04:36 changed by shai

Replying to ubernostrum:

A patch which wasn't ready for 1.1 and hasn't been updated since probably isn't ready for 1.2.

A substantial part of the approach was rejected -- core developers decided that the negative value of an added line in the default settings.py was larger than the positive value of fixing the problematic behavior in a backwards-compatible way.

(also, a better solution might just be to document that if you're doing things in raw SQL you should manually inform the transaction middleware by setting the transaction to dirty)

This has already been done, but it only "solves" part of the problem. The part where, by default, the middleware and decorator leave the transaction pending at the end of the request is not fixed. This can still cause data loss and corruption, depending on things like transaction isolation level and support of nested transactions in the database backend. Further, those would be "phantom bugs" -- hard to reproduce, because they depend on the allocation of threads to a stream of requests.

The correct behavior, clearly, is to close all transactions, whether by commit or rollback. Current behavior is a bug which needs to be fixed.

Since transactions are deep, and changing their behavior might break user applications, I suggested (perhaps not as clearly as I should have) a three-step path towards correct behavior:

  1. Make new projects behave correctly, keep behavior for old ones, tell people they need to adapt (by adding settings.py var)
  2. Make all projects behave correctly, unless explicitly overridden (removing var from default settings.py, but still supporting both values)
  3. Cease support for the above override

In his replies above, Russel effectively says we should either jump straight to the end of the path or keep living with the bug. My reply to that went unanswered. If you'd consider a patch which just always closes transactions, just say so, and I'll provide one.

Thanks, Shai.


Add/Change #9964 (Transaction middleware closes the transaction only when it's marked as dirty)




Change Properties
Action