Django

Code

Ticket #3460 (closed: fixed)

Opened 2 years ago

Last modified 2 months ago

Allow configuration of postgresql_psycopg2 isolation level

Reported by: Jack Moffitt <metajack@gmail.com> Assigned to: mtredinnick
Milestone: 1.1 beta Component: Database layer (models, ORM)
Version: SVN Keywords:
Cc: sam@robots.org.uk, jarek.zgoda@gmail.com, richard.davies@elastichosts.com, chenyuejie@gmail.com, guillermo.gutierrez@uca.es Triage Stage: Accepted
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

Postgresql only has two transaction isolation levels: read committed and serializable. READ COMMITTED is the default and the one django expects to use. psycopg2 however defines three isolation levels: autocommit, read committed, and serializable. The difference between autocommit and read committed is that read committed in psycopg2 puts all statements inside a BEGIN/END block (or BEGIN/ROLLBACK or BEGIN/COMMIT). Inside the BEGIN it also executes SET TRANSACTION ISOLATION LEVEL READ COMMITTED (which is redundant as this isolation level is the default).

Django currently uses psycopg2's read committed isolation level, and this leads to every query (almost) being surrounded by spurious BEGIN/END. In a heavily loaded system, this consumes a massive amount of the total database time.

Django should be using the autocommit isolation level, which does not have this overhead. Anywhere explicit transaction blocks are needed they can be used. The attached patch switches the backend to the autocommit isolation level. I have tested this patch on our production system and it makes a noticeable speed improvement.

NOTE: Django's ORM does an existence test followed by and UPDATE or INSERT when save() is called. The read committed isolation level does not protect against this failing even within a transaction block, and therefor this change should have no side effects on Django code.

Attachments

autocommit.diff (0.8 kB) - added by Jack Moffitt <metajack@gmail.com> on 02/08/07 14:34:35.
switch psycopg2's default isolation level to autocommit
isotest.py (0.9 kB) - added by Jack Moffitt <metajack@gmail.com> on 02/08/07 16:05:04.
test script
combined.diff (1.5 kB) - added by Jack Moffitt <metajack@gmail.com> on 02/08/07 16:05:37.
updated patch; also fixes #3459
psycopg2-commit-mode-r5790.diff (1.5 kB) - added by John Shaffer <jshaffer2112@gmail.com> on 08/03/07 16:03:29.
Updated patch for [5790]. The patch causes problems with fixtures and causes some of Django's tests to fail.
3460-autocommit.diff (2.8 kB) - added by Timothée Peignier <tim@tryphon.org> on 02/29/08 09:35:27.
Change isolation level without breaking fixtures and tests
autocommit-7624.patch (4.2 kB) - added by cgrady on 06/12/08 01:08:23.
added similar fix to postgresql backend for psycopg1
3460-r8961-autocommit.diff (5.6 kB) - added by Richard Davies <richard.davies@elastichosts.com> on 09/21/08 15:58:23.
Updated for Django 1.0, some problems remain (see my notes on ticket)
django-txpatch (2.3 kB) - added by nicferrier on 01/05/09 10:42:44.
patches dango's postgres backend for pg native autocommit
insertwithreturning.patch (2.0 kB) - added by iamseb on 01/09/09 07:05:54.
Replace SELECT CURRVAL with INSERT ... RETURNING for psycopg2 backend.
3640_r9734-autocommit.diff (6.7 kB) - added by Richard Davies <richard.davies@elastichosts.com> on 01/14/09 04:37:06.
Configurable native autocommit
3640_r9736-autocommit-safe.diff (8.0 kB) - added by iamseb on 01/14/09 10:30:45.
3460_r10645-autocommit.diff (0.7 kB) - added by kmishler@arinc.com on 04/28/09 15:22:45.

Change History

02/08/07 14:34:35 changed by Jack Moffitt <metajack@gmail.com>

  • attachment autocommit.diff added.

switch psycopg2's default isolation level to autocommit

02/08/07 16:04:45 changed by Jack Moffitt <metajack@gmail.com>

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

Note that this patch may have side effects if the database's default isolation level is set to serializable. The attached patch corrects this problem and makes sure that we're always in read committed unless otherwise overridden for the entire session. See http://www.postgresql.org/docs/8.1/interactive/sql-set-transaction.html for documentation.

I'm also attaching a test script to show that this works.

Note that the self._commit() call is uncessary in autocommit isolation mode, but is needed for the others.

02/08/07 16:05:04 changed by Jack Moffitt <metajack@gmail.com>

  • attachment isotest.py added.

test script

02/08/07 16:05:37 changed by Jack Moffitt <metajack@gmail.com>

  • attachment combined.diff added.

updated patch; also fixes #3459

02/08/07 16:06:10 changed by Jack Moffitt <metajack@gmail.com>

Also, note that #3459 needs almost the same setup, so that's why I've attached the combined patch.

02/08/07 17:41:12 changed by Marc Fargas <telenieko@telenieko.com>

  • stage changed from Unreviewed to Design decision needed.

For the record: Discussion about that has been held: here

03/07/07 18:23:22 changed by Philippe May

This patch also resolves an important bug for me: i have an application with models imported from python packages, not directly from the models.py file. Without the patch, postgres raises an error (ERROR: the relation «app_rel» doesn't exist) and the transaction is still pending, which breaks the normal execution when creating the databases (with syncdb). Hope that's clear enough... I'm not a postgres guru, so i haven't figured out the details. Contact me if interested.

Guess it's not worth filling a new ticket for my bug when a patch is ready.

08/01/07 05:18:22 changed by Nis Jørgensen <nis@superlativ.dk>

I have run into this problem as well, and fixed it by doing the exact code change you did.

This caused problems when I started to use fixtures (forward references) in my tests - I believe there are currently problems with the transaction management for those which are exposed by the fix.

I am sorry that I cannot be more specific at the moment, just thought I'd log this as something to investigate before applying the patch.

08/01/07 10:26:01 changed by mir@noris.de

#5043 marked as duplicate of this ticket.

08/02/07 18:51:22 changed by anonymous

  • cc set to sam@robots.org.uk.

08/03/07 16:03:29 changed by John Shaffer <jshaffer2112@gmail.com>

  • attachment psycopg2-commit-mode-r5790.diff added.

Updated patch for [5790]. The patch causes problems with fixtures and causes some of Django's tests to fail.

11/14/07 15:29:01 changed by jarek.zgoda@gmail.com

  • cc changed from sam@robots.org.uk to sam@robots.org.uk, jarek.zgoda@gmail.com.

11/19/07 12:34:41 changed by kmishler@arinc.com

We applied this patch for our application. It fixed a problem we were having where if an insert, update or delete threw an exception, any subsequent database operation would fail until django was restarted. We fixed the problem with fixtures by adding a line to core/management/commands/loaddata.py:

transaction.connection.connection.set_isolation_level(1)

It fixed the django unit tests for our application.

11/20/07 04:01:58 changed by mir

  • needs_better_patch set to 1.

The problem is that the patch turns every single command in a transaction for itself (this happens in autocommit without an explicit COMMIT). The new behaviour is incompatible with the old one and e.g. breaks forward references in the serialization tests, as noted by John Shaffer with the most recent patch.

I checked "patch needs improvement" since it breaks tests.

12/01/07 08:45:25 changed by jacob

  • status changed from new to closed.
  • resolution set to duplicate.

This is a situation where different aplications call for different settings, so hardcoding something isn't a great idea. I think, then, this is a situation that could be solved by #6064.

12/01/07 10:47:40 changed by Jack Moffitt <metajack@gmail.com>

I see no use case for an application that needs behavior different than this. In this patch, you always know exactly when and how you are in a transaction, and you can choose whether your code executes in one or not. The old way does not allow this flexibility and always puts you in a transaction, which for 90% of the most common queries wastes a lot of database cycles. Any site running stock django with postgresql will be crippled by this when it starts to get any load.

This is not something that is solved by #6064, or at least I think doing it that way would be more of a hack. Solving it by setting the default isolation level consistently means that everyone will understand exactly how the database is getting used. It won't be broken for most people until they figure out there is some special initializer that magically fixes things by changing how all the isolation levels work. I suppose you could make the default initializer do this, but note that this is a lot more technical and low level that making sure the time zone is correct or setting how postgres returns the date, etc. There's a reason that this functionality is provided _at the wrapper layer_ and the other is provided in the SQL language.

We have at least 3 reports here of people who need this functionality, and so far, no reports or explanation on how this functionality would be bad. Please reconsider this patch.

12/01/07 10:53:41 changed by Jack Moffitt <metajack@gmail.com>

  • status changed from closed to reopened.
  • resolution deleted.

I'm reopening because reading #6064 it is clear this is not a duplicate. That functionality is about initial SQL, and this functionality can not be changed by SQL alone. We are changing the Python wrapper's functionality here, not the database's.

There is nothing changing on the postgresql side. The default python DB-API functionality is fine for easy of use, but is not what you want in a system like Django, as it is extremely wasteful of transactions in the common case. Note that this is specified by the DB-API spec, and will probably affect every database with transactions. Each wrapper is going to have its own way to turn this off I imagine.

(follow-up: ↓ 16 ) 12/01/07 15:03:17 changed by jacob

  • status changed from reopened to closed.
  • resolution set to duplicate.

Jack, it can be changed by inital SQL -- "SET TRANSACTION ISOLATION LEVEL <foo>". If that's wrong, #6064 just needs to allow custom *python* (as well as SQL) to run upon database creation. We need to fix the general problem, not individual specific ones.

12/01/07 15:26:56 changed by Jack Moffitt <metajack@gmail.com>

There is no autocommit isolation level in SQL. If you intend #6064 to have custom python initializers per backend, then it will do what we want, but this is still quite a different thing than the tasks that custom sql initializers will be doing in general. So #6064 needs python + sane defaults (like setting autocommit by default + some way that the first time you override this to change the timezone or date style you don't completely change how the whole transaction system works by accident.

02/29/08 09:35:27 changed by Timothée Peignier <tim@tryphon.org>

  • attachment 3460-autocommit.diff added.

Change isolation level without breaking fixtures and tests

(in reply to: ↑ 14 ) 06/11/08 07:42:11 changed by Jack Moffitt <metajack@gmail.com>

  • status changed from closed to reopened.
  • resolution deleted.

Replying to jacob:

Jack, it can be changed by inital SQL -- "SET TRANSACTION ISOLATION LEVEL <foo>". If that's wrong, #6064 just needs to allow custom *python* (as well as SQL) to run upon database creation. We need to fix the general problem, not individual specific ones.

So the default Django will be really inefficient, unless the user knows the magic incantation of custom python for the database adapter they are using? This seems a poor solution. There is nothing in #6064 that recommends allowing custom python, nor is there anything in #6064 that would use such a setting for fixing the enormous problems with this poor use of DB-API.

People are still adding patches to this bug, so it is clearly still an issue for others.

I cannot see a reason why you keep closing this bug. Does it break something? The patch seems to work quite well for others. Incremental improvement now is much better than waiting another year for a perfect solution.

06/12/08 01:08:23 changed by cgrady

  • attachment autocommit-7624.patch added.

added similar fix to postgresql backend for psycopg1

07/02/08 11:21:34 changed by cpinto

I'm not sure #6064 is the patch for this. It works with connections while this patch works with cursors which are not the same thing. For example, with the connection_created signal how would you set a different timezone on a new cursor?

Since this appears to be a specific issue with pgsql and #6064 cannot be used to solve this particular issue (not without refactoring other stuff, starting with the signal itself that would either needed to be changed to cursor_created and dispatched on cursor creation or adding a new signal that served this purpose), why is this even marked as duplicate?

08/22/08 11:49:34 changed by cgrady

  • owner changed from nobody to cgrady.
  • status changed from reopened to new.

09/06/08 13:09:23 changed by Richard Davies <richard.davies@elastichosts.com>

  • cc changed from sam@robots.org.uk, jarek.zgoda@gmail.com to sam@robots.org.uk, jarek.zgoda@gmail.com, richard.davies@elastichosts.com.

09/21/08 15:58:23 changed by Richard Davies <richard.davies@elastichosts.com>

  • attachment 3460-r8961-autocommit.diff added.

Updated for Django 1.0, some problems remain (see my notes on ticket)

09/21/08 16:17:12 changed by Richard Davies <richard.davies@elastichosts.com>

I've just updated Collin Grady's patch to apply against Django 1.0 (r8961) and also current trunk.

Mostly minor reformatting work, but one significant change - [8314] introduced savepoint support for Postgresql, which we need to turn off when we are outside a transaction.

Issues: Although the patch applies again, it leaves a Django which does not support any transaction behavior other than autocommit (leading to various test suite failures). The problem is that django/db/transaction.py also needs to be rewritten to actually create transactions (i.e. issue SQL BEGIN/COMMIT or BEGIN/ROLLBACK) on top of the connection when the user explicitly requests them. My proposed solution is for discussion at http://groups.google.com/group/django-developers/msg/4e5171056b3f6558

11/11/08 20:27:53 changed by anonymous

  • cc changed from sam@robots.org.uk, jarek.zgoda@gmail.com, richard.davies@elastichosts.com to sam@robots.org.uk, jarek.zgoda@gmail.com, richard.davies@elastichosts.com, chenyuejie@gmail.com.

12/01/08 13:10:55 changed by Richard Davies <richard.davies@elastichosts.com>

  • owner changed from cgrady to Richard Davies <richard.davies@elastichosts.com>.

I've signed up to implement this for v1.1

12/07/08 20:50:51 changed by anonymous

The name/email address of the owner of this issue is pushing out the column width in the bug list. Which is of course expected behavior for TABLEs in HTML, but is causing a horizontal scrollbar now at previously "normal" browser sizes.

01/05/09 10:42:44 changed by nicferrier

  • attachment django-txpatch added.

patches dango's postgres backend for pg native autocommit

01/05/09 10:45:07 changed by nicferrier

WooMe?.com is a very high concurrency site (we have 7k transactions/sec at peak time and never less than 2k/sec). The patch I've added above sorts us out.

Using unpatched Django was becomming an issue involving > 10k locks/sec at peak times.

01/09/09 07:04:33 changed by iamseb

Following up from nicferrier's patch, there's a corresponding issue when using transaction pooling which must be resolved.

INSERT statements followed by SELECT CURRVAL for creating new objects may split the queries over two transactions, and therefore two sessions. The second session will not have the sequence value available. In this case, a far safer route is to use the RETURNING clause in INSERT statements to return the new value for the pk.

The attached insertwithreturning.patch adds a new feature to BaseDatabaseFeatures?, sets it to true for the psycopg2 backend, and conditionally creates an appropriate RETURNING clause for INSERT statements.

This feature is supported for PostgreSQL 8+ with psycopg2 (tested), and Oracle (untested, currently not enabled).

With these two patches, correct transaction isolation support for postgres is fully implemented, and scaling issues with lock concurrency and hanging transactions are mitigated. (We've seen >10k locks reduced to <100.)

01/09/09 07:05:54 changed by iamseb

  • attachment insertwithreturning.patch added.

Replace SELECT CURRVAL with INSERT ... RETURNING for psycopg2 backend.

01/09/09 07:15:21 changed by mtredinnick

  • summary changed from postgresql_psycopg2 backend uses wrong isolation level to Allow configuration of postgresql_psycopg2 isolation level.
  • stage changed from Design decision needed to Accepted.

This ticket seems to be devolving into multiple issues, which is bad. The title of the ticket is a bit incorrect at the moment (and I'm about to change that).

Right now, Django isn't using the "incorrect" isolation level. It's using one of the valid options. There's a proposal, which has been accepted, to introduce a configurable option to allow a different isolation level to be used, as per the original poster's request. That is what this ticket is about. However, the current level is what will remain as the default; changing that introduces undetectable and potentially unsafe side-effects into existing code.

If there are issues with counters being in separate transactions and things like that, they should really be in a separate ticket, since they must also be fixed with the current code. It looks to me like the last two patches (the penultimate patch needs a better explanation of the problem it is trying to solve, too. It's a little hard to reverse engineer and I'd like to have more confidence that my interpretation matches the patch authors') really both belong in a separate ticket, since they aren't dealing with the serialisation level at all. So if iamseb or nicferrier could please fix that, including a description of the problem you're trying to fix, it would be appreciated.

(Changed description to reflect the targeted solution)

01/09/09 08:22:29 changed by iamseb

As stated in the original ticket, django does not use an appropriate isolation level by default. This is a serious bug for anyone deploying django with postgres, a bug that has cost us considerable time and effort (and therefore, money). The patches submitted are intended to offer a simple, working fix that solves this issue, and hopefully to promote further discussion so that we can get a complete solution in django.

"Django should be using the autocommit isolation level, which does not have this overhead. Anywhere explicit transaction blocks are needed they can be used."

The penultimate ticket resolves this issue by ensuring that the correct isolation level is set by the psycopg2 backend for *all* transactions. We believe that this is the correct, expected behaviour. The final patch resolves a side-effect of this change by changing the lookup of newly inserted primary keys to a pattern that is held within a single transaction.

Sorry if this wasn't clearer.

01/09/09 19:38:16 changed by mtredinnick

  • needs_docs set to 1.

You didn't read my comment. Anything that changes the default isolation level isn't going to be accepted. That will lead to very hard to diagnose bugs in some existing code that is relying on the current isolation level. However, a patch that allows configuration of the isolation level to use is certainly a good idea and will be accepted. But the default remains as it is. I have no problems understanding that the current level causes problems for some use-cases. Which is fine. But they aren't the only use-cases and considering existing (two separate items) are also very important. This has been discussed ad-infinitum on the django-dev mailing list.

Since it sounds like the current patch is another case of changing the default, it needs improvement. Add a setting for the isolation level. The default value is the current level. Add some documentation for how to change it. Then we've got something that can be committed. Get it done reasonably quickly (within about a week) and this can make it into Django 1.1.

01/10/09 08:08:24 changed by nicferrier

The specific transaction isolation used is inconsequential, It's how the database connection behaves which is important. Django specifies that database connections are, by default, in autocommit mode. It is that behaviour which Django is seeking to provide.

Despite you changing the name of the ticket, we believe the original point still stands: Django does not implement autocommit correctly for Postgresql. We have emprical data to back that up. The patches from myself and iamseb above make Django implement autocommit mode properly WITHOUT any possible transaction related bug. That's why the use of CURRVAL was eliminated.

I'll happily enter into a conversation with people and show them our performance data before and after our patches.

Having said that, I'd be happy to add a switch to the patches so they can be turned on and off. Note that this is not the same as providing a configurable transaction level, as you suggest. Doing that would have much less value than our fix.

You might call our patches postgresql-native-autocommit.

If you are willing to accept patches like that then let us know and we'll rewrite them.

01/14/09 04:37:06 changed by Richard Davies <richard.davies@elastichosts.com>

  • attachment 3640_r9734-autocommit.diff added.

Configurable native autocommit

01/14/09 04:37:54 changed by Richard Davies <richard.davies@elastichosts.com>

  • needs_better_patch deleted.
  • needs_docs deleted.

I've updated nicferrier's patch to make it configurable - hopefully this can now go in for 1.1 by tomorrow.

The revised patch uses the same basic mechanism that nicferrier suggested. It is typically inactive, and is switched on with DATABASE_OPTIONS={'native_autocommit':True}, which I have documented. With this patch, SVN Django passes the full test suite using sqlite3, postgresql_psycopg2 (plain default) and postgresql_psycopg2 (native_autocommit turned on).

I fixed two bugs to get the full test suite to pass (both mentioned in various patches earlier in this ticket): savepoints are not supported in isolation level 0, and I've moved the call to the new connection._enter_transaction_management() inside transaction.enter_transaction_management() so that it also applies to fixtures loaddata.py, etc.

01/14/09 09:12:36 changed by nicferrier

It's vital that the returning patch be in there as well. Without that it is entirely possible to get inconsistent data by naive (non-transaction protected) use of the ORM.

WITH the patch it is NOT possible to get inconsistent data simply be being naive.

01/14/09 10:30:45 changed by iamseb

  • attachment 3640_r9736-autocommit-safe.diff added.

01/14/09 10:35:52 changed by iamseb

I've added an updated patch that implements a safe version of the transaction isolation patch for database connections that implement transaction pooling, whilst sticking with the implementation approach in Richard's combined patch. Transaction isolation can return incorrect data when retrieving PKs after insert operations without this addition.

01/14/09 10:48:39 changed by Richard Davies <richard.davies@elastichosts.com>

Thanks - looks good to me. I imagine that you want "self.features.can_return_id_from_insert = True", not False, on new line 70 of backends/postgresql_psycopg2/base.py? I'd assume this is a postgres feature which is always available regardless of isolation level?

01/14/09 11:20:34 changed by iamseb

I've made it behave the same way as the native_autocommit setting for now, due to previous disquiet about changing the default behaviour. If there's general agreement that this is the correct method for retrieving inserted pk values then I'd be very happy for it to become the default.

01/23/09 21:45:15 changed by mtredinnick

  • owner changed from Richard Davies <richard.davies@elastichosts.com> to mtredinnick.

02/28/09 10:32:28 changed by Richard Davies <richard.davies@elastichosts.com>

  • milestone set to 1.1 beta.

I'm tagging this with the 1.1beta milestone, given that it is already on the Version1.1Features list

03/11/09 02:06:51 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to fixed.

(In [10029]) Fixed #3460 -- Added an ability to enable true autocommit for psycopg2 backend.

Ensure to read the documentation before blindly enabling this: requires some code audits first, but might well be worth it for busy sites.

Thanks to nicferrier, iamseb and Richard Davies for help with this patch.

03/11/09 08:06:26 changed by terrex

  • cc changed from sam@robots.org.uk, jarek.zgoda@gmail.com, richard.davies@elastichosts.com, chenyuejie@gmail.com to sam@robots.org.uk, jarek.zgoda@gmail.com, richard.davies@elastichosts.com, chenyuejie@gmail.com, guillermo.gutierrez@uca.es.

[10029] Introduces the use of "RETURNING" clause for "insert into" statements. This is not available on my postgres 8.1.3. There is a way to disable this feature for older postgres?

-- psycopg2.ProgrammingError: error de sintaxis en o cerca de «RETURNING» en el caracter 113

03/11/09 08:32:22 changed by mtredinnick

Whoops. I assumed it would be there in all the releases. Sorry... I was lazy.

Commenting on a closed ticket isn't particularly helpful in identifying a bug, however. I've opened #10467 to fix that problem. Will look at it tomorrow (it's after midnight here now and I"m not going to tackle that now).

04/28/09 15:22:45 changed by kmishler@arinc.com

  • attachment 3460_r10645-autocommit.diff added.

04/28/09 15:23:20 changed by kmishler@arinc.com

Using the latest django code from subversion (revision 10645), when I have it autocommit set to True in my settings.py, it looks like the isolation_level of the connection is never being set to 0. 3460_r10645-autocommit.diff patch seems to fix the problem, but I don't know if it's the right way to fix it.

04/28/09 15:34:12 changed by Alex

If you believe there is a bug please file a new ticket.

04/29/09 08:41:11 changed by kmishler@arinc.com

Created ticket #10958


Add/Change #3460 (Allow configuration of postgresql_psycopg2 isolation level)




Change Properties
Action