Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#17266 closed Cleanup/optimization (fixed)

psycopg2 backend should use get_parameter_status to check if SET TIME ZONE is needed

Reported by: akaariai Owned by: aaugustin
Component: Database layer (models, ORM) Version:
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In ticket #17062 Aymeric Augustin suggested the idea for using get_parameter_status to check if the database default time zone is already correct. If it is, we can skip the SET TIME ZONE and associated isolation level handling for every new connection. Ticket #17062 has more details, but I will copy-paste the most relevant parts here. If this is not done, for every new connection we need to do these queries:

26639: LOG:  connection authorized: user=akaj database=testdb1
26639: LOG:  statement: SHOW default_transaction_isolation
26639: LOG:  statement: SET default_transaction_isolation TO DEFAULT
26639: LOG:  statement: SET TIME ZONE 'America/Chicago'
26639: LOG:  statement: SET default_transaction_isolation TO 'read uncommitted'
26639: LOG:  statement: BEGIN
26639: LOG:  statement: first real query...

If we use get_parameter_status, we can get it down to this:

27163: LOG:  connection authorized: user=akaj database=testdb1
27163: LOG:  statement: SHOW default_transaction_isolation
27163: LOG:  statement: BEGIN
27163: LOG:  statement: first real query

The get_parameter_status was added in psycopg2 version 2.0.12 http://initd.org/psycopg/docs/connection.html#connection.get_parameter_status. I can't find a mention if we need to support versions prior to this. If we need to support versions prior to 2.0.12, we can just add a check for the psycopg2 version in use and not use get_parameter_status if the version doesn't support it.

I will wait until ticket #17062 gets fixed before working on this.

Attachments (2)

17266.diff (3.3 KB) - added by akaariai 3 years ago.
17266.2.diff (3.6 KB) - added by aaugustin 3 years ago.

Download all attachments as: .zip

Change History (13)

comment:1 Changed 3 years ago by aaugustin

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 3 years ago by akaariai

  • Has patch set
  • Type changed from Uncategorized to Cleanup/optimization

Provided is a patch that does use get_parameter_status to sniff the connection's default time zone. I did a little testing of this. Queries generated for a new connection with patch (using psycopg2 version 2.4.2):

21335: LOG:  connection authorized: user=akaj database=testdb1
21335: LOG:  statement: SHOW default_transaction_isolation
21335: LOG:  statement: BEGIN
21335: LOG:  statement: first real query

Without patch:

21355: LOG:  connection authorized: user=akaj database=testdb1
21355: LOG:  statement: SHOW default_transaction_isolation
21355: LOG:  statement: SET default_transaction_isolation TO DEFAULT
21355: LOG:  statement: SET TIME ZONE 'America/Chicago'
21355: LOG:  statement: SET default_transaction_isolation TO 'read committed'
21355: LOG:  statement: BEGIN
21355: LOG:  statement: first real query...

The precondition for this is that the connections default time zone is actually the wanted timezone (settings TIME_ZONE if not USE_TZ, else UTC). Attached patch contains some documentation mentioning that it is a good idea to have the database or user defaults as mentioned above. Also, I think you will need psycopg2 version 2.4.2 or later (release notes) or you will get more queries than in the examples posted above. The patch should work with all versions of psycopg2, though.

The speed difference for making single query and then disconnecting is about 10% on a localhost database. It would be likely more if the database was on another machine, as there is no TCP/IP overhead for localhost connections. The test was:

start = datetime.now()
for i in range(0, 1):
    list(SomeModel.objects.all()) # no rows in the DB, so this is very lightweight query.
    connection.close()
print datetime.now() - start

So, there is some gain from this.

I have guarded against psycopg2 versions not having get_parameter_status in the patch (versions prior to 2.0.12).

I don't see any easy way to test if you actually get less queries - psycopg2 will not report all the internal queries made. Manually it is easy to test: just set log_statement to 'all' and log_connections to on in postgresql.conf (or for the db user) and see PostgreSQL's log to check what queries are made.

I haven't build the documentation, dunno if it really works. The reviewer/committer might want to check the language used in the documentation, my English isn't too good. Also, I haven't tested this on any other psycopg version than 2.4.2.

Changed 3 years ago by akaariai

comment:3 Changed 3 years ago by akaariai

I have tested this on psycopg 2.0.1, which does not work at all with Django (unrelated to this ticket - expects port as an integer), psycopg2 2.0.7 - works, psycopg2 2.0.11 - works, psycopg 2.3.2 works and finally the current psycopg 2.4.2 - works. Results in above post are with 2.4.2. In no case did the patch result in worse behavior. If get_parameter_status is not available, the patch doesn't change anything for new connections. If it is available, and time zone is correct for the connection, you can get rid of 3 non-necessary queries.

My opinion is that the patch is ready for committer. A summary for reviewer / committer:

  • Please review the documentation changes with extra care. Writing documentation is not my strongest area...
  • The patch will get rid of all non-necessary queries for new connections when settings for the connection are correct (in postgresql.conf or for the user). The documentation mentions the necessary changes to connection settings.
  • The patch should work with all supported psycopg2 versions.
  • There are no tests, as there is no easy way to test what queries psycopg2 has actually executed. See above post for how to test this manually.

comment:4 Changed 3 years ago by aaugustin

  • Owner changed from nobody to aaugustin

comment:5 Changed 3 years ago by aaugustin

  • Patch needs improvement set

In the documentation, you suggest setting client_encoding, timezone, and default_transaction_isolation. However, in the code, I only see changes for timezone.

Did you intend to make similar changes for the two other parameters? Or did I misunderstand something?

comment:6 Changed 3 years ago by akaariai

The documentation talks about postgresql.conf. That is, setting those parameters is done in postgresql.conf, or for the user you use for taking the connection. That is, those are hints for things you should do to your PostgreSQL setup. If you do change those, you will get fewer queries (including a removed extra commit for timezone) in connection setup. The parameters are set up for each new connection, but late versions (at least 2.4.2) of psycopg are smart enough to skip the SET queries for client_encoding and transaction_isolation if they are already correct. And the included get_parameter_status change allows for skipping the SET timezone.

I don't know if it is a good idea to include this hint in the documentation, but I figured that it is better to mention it than keep it as hidden knowledge. Also, the documentation probably needs to be more clear. If you missed this, then so will others. Me need better writing skills :)

comment:7 Changed 3 years ago by aaugustin

Thanks, I was just missing this piece of information:

late versions (at least 2.4.2) of psycopg are smart enough to skip the SET queries for client_encoding and transaction_isolation if they are already correct

I'm going to rework the patch slightly and commit it.

Changed 3 years ago by aaugustin

comment:8 Changed 3 years ago by aaugustin

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

Attached patch is probably good to go.

Since we don't have control over PostgreSQL's configuration in the test suite, there are no tests. I tested manually and reproduced Anssi's results.

comment:9 Changed 3 years ago by akaariai

Code looks good, and the documentation is much better than my version. Too bad there isn't any way to test the generated queries. Would it make sense to add a reference to this ticket in the code as a guard against future regressions? On the other hand, that part of the code begins to look like Trac ticket listing soon :)

comment:10 Changed 3 years ago by aaugustin

  • Resolution set to fixed
  • Status changed from new to closed

In [17194]:

Fixed #17266 -- Skipped the "SET TIME ZONE" query for PostgreSQL when it isn't necessary.

comment:11 Changed 3 years ago by aaugustin

Thanks Anssi — I forgot to credit you in the commit message :(

Note: See TracTickets for help on using tickets.
Back to Top