postgresql_psycopg2 backend registers psycopg2's UNICODE extension, which can cause interference if Django is not the only component using psycopg2
|Reported by:||Chris Wagner <cw264701@…>||Owned by:||nobody|
|Component:||Database layer (models, ORM)||Version:||master|
|Cc:||sam@…||Triage Stage:||Design decision needed|
|Has patch:||no||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
This one led me on a wild goose chase. I am trying to use SQLAlchemy alongside Django.
Here's the problem: The default encoding for a psycopg2 connection is "SQL_ASCII". And, by default, psycopg2 accepts and passes back non-Unicode strings (i.e., Python str objects, not unicode objects). SQLAlchemy works okay using this setup, as it does conversion between unicode objects and utf-8-encoded str objects as data passes to and from the database.
Django, however, seems to rely on psycopg2 to do the conversions; so, it registers psycopg2's "UNICODE" extension:
This is done in django/db/backends/postgresql_psycopg2/base.py, upon loading that module. When this option is set, psycopg2 tries to convert all results to unicode objects. If the default encoding, "SQL_ASCII", is in use, this will cause UnicodeDecodeError's to be raised upon attempting to pull out some non-ASCII text from the database...
However, this is okay for Django's personal needs, because it also sets the client encoding for its psycopg2 database connection:
This change, however, only affects the given connection object, which is local to Django. Unfortunately, SQLAlchemy does not set the client encoding for its connections.
So, by registering psycopg2's UNICODE extension, Django places a restriction on all psycopg2 connections that wish to deal with Unicode: all of the connections must set_client_encoding to UTF8 (or perhaps another Unicode encoding). This doesn't sound like a big deal, but:
- it would take some serious hack-arounds to make sure SQLAlchemy's psycopg2 connections all use the right encoding (i.e., call connection.set_client_encoding('utf8')), and
- this can lead to very difficult to trackdown problems.
This "bug" led to some especially odd behavior, in my case. I was finding that, early on in my test script, there were no problems inserting and selecting non-ASCII text into/from the database. It took me a long time to realize that, it was only after certain parts of Django had been loaded that errors would start flying. It took a whole lot of trial-and-error (commenting out bits of Django, loading various modules, etc.) to get to the bottom of things.
The only foolproof way that I can think of, for fixing this, is to program to Django to behave as SQLAlchemy does: it should manually convert to/from unicode objects.
Change History (13)
comment:1 Changed 9 years ago by mtredinnick
- Needs documentation unset
- Needs tests unset
- Patch needs improvement unset
comment:4 Changed 9 years ago by mtredinnick
- Triage Stage changed from Unreviewed to Design decision needed
comment:10 Changed 8 years ago by mtredinnick
- Resolution set to wontfix
- Status changed from new to closed
comment:12 Changed 6 years ago by mhart
- Resolution wontfix deleted
- Status changed from closed to reopened