#3460 closed (fixed)
Allow configuration of postgresql_psycopg2 isolation level
Reported by: | Owned by: | Malcolm Tredinnick | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Keywords: | ||
Cc: | sam@…, jarek.zgoda@…, richard.davies@…, chenyuejie@…, guillermo.gutierrez@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Postgresql only has two transaction isolation levels: read committed and serializable. READ COMMITTED is the default and the one django expects to use. psycopg2 however defines three isolation levels: autocommit, read committed, and serializable. The difference between autocommit and read committed is that read committed in psycopg2 puts all statements inside a BEGIN/END block (or BEGIN/ROLLBACK or BEGIN/COMMIT). Inside the BEGIN it also executes SET TRANSACTION ISOLATION LEVEL READ COMMITTED (which is redundant as this isolation level is the default).
Django currently uses psycopg2's read committed isolation level, and this leads to every query (almost) being surrounded by spurious BEGIN/END. In a heavily loaded system, this consumes a massive amount of the total database time.
Django should be using the autocommit isolation level, which does not have this overhead. Anywhere explicit transaction blocks are needed they can be used. The attached patch switches the backend to the autocommit isolation level. I have tested this patch on our production system and it makes a noticeable speed improvement.
NOTE: Django's ORM does an existence test followed by and UPDATE or INSERT when save() is called. The read committed isolation level does not protect against this failing even within a transaction block, and therefor this change should have no side effects on Django code.
Attachments (12)
Change History (55)
by , 18 years ago
Attachment: | autocommit.diff added |
---|
comment:1 by , 18 years ago
Note that this patch may have side effects if the database's default isolation level is set to serializable. The attached patch corrects this problem and makes sure that we're always in read committed unless otherwise overridden for the entire session. See http://www.postgresql.org/docs/8.1/interactive/sql-set-transaction.html for documentation.
I'm also attaching a test script to show that this works.
Note that the self._commit() call is uncessary in autocommit isolation mode, but is needed for the others.
comment:2 by , 18 years ago
Also, note that #3459 needs almost the same setup, so that's why I've attached the combined patch.
comment:3 by , 18 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
For the record:
Discussion about that has been held: here
comment:4 by , 18 years ago
This patch also resolves an important bug for me: i have an application with models imported from python packages, not directly from the models.py file. Without the patch, postgres raises an error (ERROR: the relation «app_rel» doesn't exist) and the transaction is still pending, which breaks the normal execution when creating the databases (with syncdb). Hope that's clear enough...
I'm not a postgres guru, so i haven't figured out the details. Contact me if interested.
Guess it's not worth filling a new ticket for my bug when a patch is ready.
comment:5 by , 17 years ago
I have run into this problem as well, and fixed it by doing the exact code change you did.
This caused problems when I started to use fixtures (forward references) in my tests - I believe there are currently problems with the transaction management for those which are exposed by the fix.
I am sorry that I cannot be more specific at the moment, just thought I'd log this as something to investigate before applying the patch.
comment:7 by , 17 years ago
Cc: | added |
---|
by , 17 years ago
Attachment: | psycopg2-commit-mode-r5790.diff added |
---|
Updated patch for [5790]. The patch causes problems with fixtures and causes some of Django's tests to fail.
comment:8 by , 17 years ago
Cc: | added |
---|
comment:9 by , 17 years ago
We applied this patch for our application. It fixed a problem we were having where if an insert, update or delete threw an exception, any subsequent database operation would fail until django was restarted. We fixed the problem with fixtures by adding a line to core/management/commands/loaddata.py:
transaction.connection.connection.set_isolation_level(1)
It fixed the django unit tests for our application.
comment:10 by , 17 years ago
Patch needs improvement: | set |
---|
The problem is that the patch turns every single command in a transaction for itself (this happens in autocommit without an explicit COMMIT). The new behaviour is incompatible with the old one and e.g. breaks forward references in the serialization tests, as noted by John Shaffer with the most recent patch.
I checked "patch needs improvement" since it breaks tests.
comment:11 by , 17 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
This is a situation where different aplications call for different settings, so hardcoding something isn't a great idea. I think, then, this is a situation that could be solved by #6064.
comment:12 by , 17 years ago
I see no use case for an application that needs behavior different than this. In this patch, you always know exactly when and how you are in a transaction, and you can choose whether your code executes in one or not. The old way does not allow this flexibility and always puts you in a transaction, which for 90% of the most common queries wastes a lot of database cycles. Any site running stock django with postgresql will be crippled by this when it starts to get any load.
This is not something that is solved by #6064, or at least I think doing it that way would be more of a hack. Solving it by setting the default isolation level consistently means that everyone will understand exactly how the database is getting used. It won't be broken for most people until they figure out there is some special initializer that magically fixes things by changing how all the isolation levels work. I suppose you could make the default initializer do this, but note that this is a lot more technical and low level that making sure the time zone is correct or setting how postgres returns the date, etc. There's a reason that this functionality is provided _at the wrapper layer_ and the other is provided in the SQL language.
We have at least 3 reports here of people who need this functionality, and so far, no reports or explanation on how this functionality would be bad. Please reconsider this patch.
comment:13 by , 17 years ago
Resolution: | duplicate |
---|---|
Status: | closed → reopened |
I'm reopening because reading #6064 it is clear this is not a duplicate. That functionality is about initial SQL, and this functionality can not be changed by SQL alone. We are changing the Python wrapper's functionality here, not the database's.
There is nothing changing on the postgresql side. The default python DB-API functionality is fine for easy of use, but is not what you want in a system like Django, as it is extremely wasteful of transactions in the common case. Note that this is specified by the DB-API spec, and will probably affect every database with transactions. Each wrapper is going to have its own way to turn this off I imagine.
follow-up: 16 comment:14 by , 17 years ago
Resolution: | → duplicate |
---|---|
Status: | reopened → closed |
Jack, it can be changed by inital SQL -- "SET TRANSACTION ISOLATION LEVEL <foo>". If that's wrong, #6064 just needs to allow custom *python* (as well as SQL) to run upon database creation. We need to fix the general problem, not individual specific ones.
comment:15 by , 17 years ago
There is no autocommit isolation level in SQL. If you intend #6064 to have custom python initializers per backend, then it will do what we want, but this is still quite a different thing than the tasks that custom sql initializers will be doing in general. So #6064 needs python + sane defaults (like setting autocommit by default + some way that the first time you override this to change the timezone or date style you don't completely change how the whole transaction system works by accident.
by , 17 years ago
Attachment: | 3460-autocommit.diff added |
---|
Change isolation level without breaking fixtures and tests
comment:16 by , 17 years ago
Resolution: | duplicate |
---|---|
Status: | closed → reopened |
Replying to jacob:
Jack, it can be changed by inital SQL -- "SET TRANSACTION ISOLATION LEVEL <foo>". If that's wrong, #6064 just needs to allow custom *python* (as well as SQL) to run upon database creation. We need to fix the general problem, not individual specific ones.
So the default Django will be really inefficient, unless the user knows the magic incantation of custom python for the database adapter they are using? This seems a poor solution. There is nothing in #6064 that recommends allowing custom python, nor is there anything in #6064 that would use such a setting for fixing the enormous problems with this poor use of DB-API.
People are still adding patches to this bug, so it is clearly still an issue for others.
I cannot see a reason why you keep closing this bug. Does it break something? The patch seems to work quite well for others. Incremental improvement now is much better than waiting another year for a perfect solution.
by , 17 years ago
Attachment: | autocommit-7624.patch added |
---|
added similar fix to postgresql backend for psycopg1
comment:17 by , 17 years ago
I'm not sure #6064 is the patch for this. It works with connections while this patch works with cursors which are not the same thing. For example, with the connection_created signal how would you set a different timezone on a new cursor?
Since this appears to be a specific issue with pgsql and #6064 cannot be used to solve this particular issue (not without refactoring other stuff, starting with the signal itself that would either needed to be changed to cursor_created and dispatched on cursor creation or adding a new signal that served this purpose), why is this even marked as duplicate?
comment:18 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | reopened → new |
comment:19 by , 16 years ago
Cc: | added |
---|
by , 16 years ago
Attachment: | 3460-r8961-autocommit.diff added |
---|
Updated for Django 1.0, some problems remain (see my notes on ticket)
comment:20 by , 16 years ago
I've just updated Collin Grady's patch to apply against Django 1.0 (r8961) and also current trunk.
Mostly minor reformatting work, but one significant change - [8314] introduced savepoint support for Postgresql, which we need to turn off when we are outside a transaction.
Issues: Although the patch applies again, it leaves a Django which does not support any transaction behavior other than autocommit (leading to various test suite failures). The problem is that django/db/transaction.py also needs to be rewritten to actually create transactions (i.e. issue SQL BEGIN/COMMIT or BEGIN/ROLLBACK) on top of the connection when the user explicitly requests them. My proposed solution is for discussion at http://groups.google.com/group/django-developers/msg/4e5171056b3f6558
comment:21 by , 16 years ago
Cc: | added |
---|
comment:23 by , 16 years ago
The name/email address of the owner of this issue is pushing out the column width in the bug list. Which is of course expected behavior for TABLEs in HTML, but is causing a horizontal scrollbar now at previously "normal" browser sizes.
by , 16 years ago
Attachment: | django-txpatch added |
---|
patches dango's postgres backend for pg native autocommit
comment:24 by , 16 years ago
WooMe.com is a very high concurrency site (we have 7k transactions/sec at peak time and never less than 2k/sec). The patch I've added above sorts us out.
Using unpatched Django was becomming an issue involving > 10k locks/sec at peak times.
comment:25 by , 16 years ago
Following up from nicferrier's patch, there's a corresponding issue when using transaction pooling which must be resolved.
INSERT statements followed by SELECT CURRVAL for creating new objects may split the queries over two transactions, and therefore two sessions. The second session will not have the sequence value available. In this case, a far safer route is to use the RETURNING clause in INSERT statements to return the new value for the pk.
The attached insertwithreturning.patch adds a new feature to BaseDatabaseFeatures, sets it to true for the psycopg2 backend, and conditionally creates an appropriate RETURNING clause for INSERT statements.
This feature is supported for PostgreSQL 8+ with psycopg2 (tested), and Oracle (untested, currently not enabled).
With these two patches, correct transaction isolation support for postgres is fully implemented, and scaling issues with lock concurrency and hanging transactions are mitigated. (We've seen >10k locks reduced to <100.)
by , 16 years ago
Attachment: | insertwithreturning.patch added |
---|
Replace SELECT CURRVAL with INSERT ... RETURNING for psycopg2 backend.
comment:26 by , 16 years ago
Summary: | postgresql_psycopg2 backend uses wrong isolation level → Allow configuration of postgresql_psycopg2 isolation level |
---|---|
Triage Stage: | Design decision needed → Accepted |
This ticket seems to be devolving into multiple issues, which is bad. The title of the ticket is a bit incorrect at the moment (and I'm about to change that).
Right now, Django isn't using the "incorrect" isolation level. It's using one of the valid options. There's a proposal, which has been accepted, to introduce a configurable option to allow a different isolation level to be used, as per the original poster's request. That is what this ticket is about. However, the current level is what will remain as the default; changing that introduces undetectable and potentially unsafe side-effects into existing code.
If there are issues with counters being in separate transactions and things like that, they should really be in a separate ticket, since they must also be fixed with the current code. It looks to me like the last two patches (the penultimate patch needs a better explanation of the problem it is trying to solve, too. It's a little hard to reverse engineer and I'd like to have more confidence that my interpretation matches the patch authors') really both belong in a separate ticket, since they aren't dealing with the serialisation level at all. So if iamseb or nicferrier could please fix that, including a description of the problem you're trying to fix, it would be appreciated.
(Changed description to reflect the targeted solution)
comment:27 by , 16 years ago
As stated in the original ticket, django does not use an appropriate isolation level by default. This is a serious bug for anyone deploying django with postgres, a bug that has cost us considerable time and effort (and therefore, money). The patches submitted are intended to offer a simple, working fix that solves this issue, and hopefully to promote further discussion so that we can get a complete solution in django.
"Django should be using the autocommit isolation level, which does not have this overhead. Anywhere explicit transaction blocks are needed they can be used."
The penultimate ticket resolves this issue by ensuring that the correct isolation level is set by the psycopg2 backend for *all* transactions. We believe that this is the correct, expected behaviour. The final patch resolves a side-effect of this change by changing the lookup of newly inserted primary keys to a pattern that is held within a single transaction.
Sorry if this wasn't clearer.
comment:28 by , 16 years ago
Needs documentation: | set |
---|
You didn't read my comment. Anything that changes the default isolation level isn't going to be accepted. That will lead to very hard to diagnose bugs in some existing code that is relying on the current isolation level. However, a patch that allows configuration of the isolation level to use is certainly a good idea and will be accepted. But the default remains as it is. I have no problems understanding that the current level causes problems for some use-cases. Which is fine. But they aren't the only use-cases and considering existing (two separate items) are also very important. This has been discussed ad-infinitum on the django-dev mailing list.
Since it sounds like the current patch is another case of changing the default, it needs improvement. Add a setting for the isolation level. The default value is the current level. Add some documentation for how to change it. Then we've got something that can be committed. Get it done reasonably quickly (within about a week) and this can make it into Django 1.1.
comment:29 by , 16 years ago
The specific transaction isolation used is inconsequential, It's how the database connection behaves which is important. Django specifies that database connections are, by default, in autocommit mode. It is that behaviour which Django is seeking to provide.
Despite you changing the name of the ticket, we believe the original point still stands: Django does not implement autocommit correctly for Postgresql. We have emprical data to back that up. The patches from myself and iamseb above make Django implement autocommit mode properly WITHOUT any possible transaction related bug. That's why the use of CURRVAL was eliminated.
I'll happily enter into a conversation with people and show them our performance data before and after our patches.
Having said that, I'd be happy to add a switch to the patches so they can be turned on and off. Note that this is not the same as providing a configurable transaction level, as you suggest. Doing that would have much less value than our fix.
You might call our patches postgresql-native-autocommit.
If you are willing to accept patches like that then let us know and we'll rewrite them.
comment:30 by , 16 years ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | unset |
I've updated nicferrier's patch to make it configurable - hopefully this can now go in for 1.1 by tomorrow.
The revised patch uses the same basic mechanism that nicferrier suggested. It is typically inactive, and is switched on with DATABASE_OPTIONS={'native_autocommit':True}, which I have documented. With this patch, SVN Django passes the full test suite using sqlite3, postgresql_psycopg2 (plain default) and postgresql_psycopg2 (native_autocommit turned on).
I fixed two bugs to get the full test suite to pass (both mentioned in various patches earlier in this ticket): savepoints are not supported in isolation level 0, and I've moved the call to the new connection._enter_transaction_management() inside transaction.enter_transaction_management() so that it also applies to fixtures loaddata.py, etc.
comment:31 by , 16 years ago
It's vital that the returning patch be in there as well. Without that it is entirely possible to get inconsistent data by naive (non-transaction protected) use of the ORM.
WITH the patch it is NOT possible to get inconsistent data simply be being naive.
by , 16 years ago
Attachment: | 3640_r9736-autocommit-safe.diff added |
---|
comment:32 by , 16 years ago
I've added an updated patch that implements a safe version of the transaction isolation patch for database connections that implement transaction pooling, whilst sticking with the implementation approach in Richard's combined patch. Transaction isolation can return incorrect data when retrieving PKs after insert operations without this addition.
comment:33 by , 16 years ago
Thanks - looks good to me. I imagine that you want "self.features.can_return_id_from_insert = True", not False, on new line 70 of backends/postgresql_psycopg2/base.py? I'd assume this is a postgres feature which is always available regardless of isolation level?
comment:34 by , 16 years ago
I've made it behave the same way as the native_autocommit setting for now, due to previous disquiet about changing the default behaviour. If there's general agreement that this is the correct method for retrieving inserted pk values then I'd be very happy for it to become the default.
comment:35 by , 16 years ago
Owner: | changed from | to
---|
comment:36 by , 16 years ago
milestone: | → 1.1 beta |
---|
I'm tagging this with the 1.1beta milestone, given that it is already on the Version1.1Features list
comment:37 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [10029]) Fixed #3460 -- Added an ability to enable true autocommit for psycopg2 backend.
Ensure to read the documentation before blindly enabling this: requires some
code audits first, but might well be worth it for busy sites.
Thanks to nicferrier, iamseb and Richard Davies for help with this patch.
comment:38 by , 16 years ago
Cc: | added |
---|
[10029] Introduces the use of "RETURNING" clause for "insert into" statements. This is not available on my postgres 8.1.3. There is a way to disable this feature for older postgres?
--
psycopg2.ProgrammingError: error de sintaxis en o cerca de «RETURNING» en el caracter 113
comment:39 by , 16 years ago
Whoops. I assumed it would be there in all the releases. Sorry... I was lazy.
Commenting on a closed ticket isn't particularly helpful in identifying a bug, however. I've opened #10467 to fix that problem. Will look at it tomorrow (it's after midnight here now and I"m not going to tackle that now).
by , 16 years ago
Attachment: | 3460_r10645-autocommit.diff added |
---|
comment:40 by , 16 years ago
Using the latest django code from subversion (revision 10645), when I have it autocommit set to True in my settings.py, it looks like the isolation_level of the connection is never being set to 0. 3460_r10645-autocommit.diff patch seems to fix the problem, but I don't know if it's the right way to fix it.
switch psycopg2's default isolation level to autocommit