postgresql_psycopg2 backend registers psycopg2's UNICODE extension, which can cause interference if Django is not the only component using psycopg2
|Reported by:||Owned by:||nobody|
|Component:||Database layer (models, ORM)||Version:||master|
|Cc:||sam@…||Triage Stage:||Design decision needed|
|Has patch:||no||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
This one led me on a wild goose chase. I am trying to use SQLAlchemy alongside Django.
Here's the problem: The default encoding for a psycopg2 connection is "SQL_ASCII". And, by default, psycopg2 accepts and passes back non-Unicode strings (i.e., Python
str objects, not
unicode objects). SQLAlchemy works okay using this setup, as it does conversion between
unicode objects and utf-8-encoded
str objects as data passes to and from the database.
Django, however, seems to rely on psycopg2 to do the conversions; so, it registers psycopg2's "UNICODE" extension:
This is done in
django/db/backends/postgresql_psycopg2/base.py, upon loading that module. When this option is set, psycopg2 tries to convert all results to
unicode objects. If the default encoding, "SQL_ASCII", is in use, this will cause
UnicodeDecodeError's to be raised upon attempting to pull out some non-ASCII text from the database...
However, this is okay for Django's personal needs, because it also sets the client encoding for its psycopg2 database connection:
This change, however, only affects the given
connection object, which is local to Django. Unfortunately, SQLAlchemy does not set the client encoding for its connections.
So, by registering psycopg2's UNICODE extension, Django places a restriction on all psycopg2 connections that wish to deal with Unicode: all of the connections must
set_client_encoding to UTF8 (or perhaps another Unicode encoding). This doesn't sound like a big deal, but:
- it would take some serious hack-arounds to make sure SQLAlchemy's psycopg2 connections all use the right encoding (i.e., call
- this can lead to very difficult to trackdown problems.
This "bug" led to some especially odd behavior, in my case. I was finding that, early on in my test script, there were no problems inserting and selecting non-ASCII text into/from the database. It took me a long time to realize that, it was only after certain parts of Django had been loaded that errors would start flying. It took a whole lot of trial-and-error (commenting out bits of Django, loading various modules, etc.) to get to the bottom of things.
The only foolproof way that I can think of, for fixing this, is to program to Django to behave as SQLAlchemy does: it should manually convert to/from