Code

Opened 7 years ago

Closed 4 years ago

#5171 closed (wontfix)

postgresql_psycopg2 backend registers psycopg2's UNICODE extension, which can cause interference if Django is not the only component using psycopg2

Reported by: Chris Wagner <cw264701@…> Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Keywords:
Cc: sam@… Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

This one led me on a wild goose chase. I am trying to use SQLAlchemy alongside Django.

Here's the problem: The default encoding for a psycopg2 connection is "SQL_ASCII". And, by default, psycopg2 accepts and passes back non-Unicode strings (i.e., Python str objects, not unicode objects). SQLAlchemy works okay using this setup, as it does conversion between unicode objects and utf-8-encoded str objects as data passes to and from the database.

Django, however, seems to rely on psycopg2 to do the conversions; so, it registers psycopg2's "UNICODE" extension:

psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)

This is done in django/db/backends/postgresql_psycopg2/base.py, upon loading that module. When this option is set, psycopg2 tries to convert all results to unicode objects. If the default encoding, "SQL_ASCII", is in use, this will cause UnicodeDecodeError's to be raised upon attempting to pull out some non-ASCII text from the database...

However, this is okay for Django's personal needs, because it also sets the client encoding for its psycopg2 database connection:

self.connection.set_client_encoding('UTF8')

This change, however, only affects the given connection object, which is local to Django. Unfortunately, SQLAlchemy does not set the client encoding for its connections.

So, by registering psycopg2's UNICODE extension, Django places a restriction on all psycopg2 connections that wish to deal with Unicode: all of the connections must set_client_encoding to UTF8 (or perhaps another Unicode encoding). This doesn't sound like a big deal, but:

  • it would take some serious hack-arounds to make sure SQLAlchemy's psycopg2 connections all use the right encoding (i.e., call connection.set_client_encoding('utf8')), and
  • this can lead to very difficult to trackdown problems.

This "bug" led to some especially odd behavior, in my case. I was finding that, early on in my test script, there were no problems inserting and selecting non-ASCII text into/from the database. It took me a long time to realize that, it was only after certain parts of Django had been loaded that errors would start flying. It took a whole lot of trial-and-error (commenting out bits of Django, loading various modules, etc.) to get to the bottom of things.

The only foolproof way that I can think of, for fixing this, is to program to Django to behave as SQLAlchemy does: it should manually convert to/from unicode objects.

Attachments (0)

Change History (13)

comment:1 Changed 7 years ago by mtredinnick

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I'm not really enthused about making a change like this. The conversion is designed to be done by the database interface, which is what we are doing. Moving it all into Django feels unnecessary and will be slower in a lot of cases, too.

We might need to do this for handling binary data, but I'm hoping we can avoid doing so. Doing it just for SQLAlchemy isn't a good reason at the moment. SQLAlchemy should allow client encoding settings and trying to pass non-ASCII UTF-8 data through a setting that is set to SQL_ASCII feels very dangerous.

Going to think about it some more, but my immediate reaction is a strong -1. It would be fixing the problem in the wrong location. The workaround is for you to create your own database backend (copy the postgresql_psycopg2 one) and make the changes locally.

comment:2 Changed 7 years ago by ubernostrum

I'm inclined to agree with Malcolm, though I agree that it sucks that this would be hard to fix in SQLAlchemy. But the mechanism to specify the encoding for the connection is there for a reason, and anything which relies on a particular encoding needs to use that mechanism to ensure it's getting what it expects.

comment:3 Changed 7 years ago by Chris Wagner <cw264701@…>

I'm not particularly concerned about having to workaround this problem, myself. My concern is in leaving others vulnerable to this same, ridiculously-difficult-to-track-down bug.

It's not just about being compatible with SQLAlchemy; it's about leaving the runtime environment as clean as possible. Django shouldn't be making global changes that will affect other, unrelated components. SQLAlchemy seems to play nice, by avoiding such global changes.

I would actually blame psycopg2, but I doubt I'd get a positive response from its developer(s). It would probably be a much more dramatic change to fix psycopg2. I suspect Django already does manual encoding/decoding for some of the other backends, already...

comment:4 Changed 7 years ago by mtredinnick

  • Triage Stage changed from Unreviewed to Design decision needed

This is either a psycopg2 bug (because it doesn't do encoding transformations on a per-connection basis) or an SQLAlchemy bug (because it imposes expectations on the connection encoding without controlling it itself or bothering to discover what it might be) or both. Django takes explicit responsibility for the connection encoding. We work out what it is, we adapt. Moving all the manual encoding/decoding into Django is slower and more maintenance and we're using the database interface correctly, so it's a really ugly solution to somebody else's problem.

Still, we might have to work around other broken software because they're, well, broken. I'm still thinking about this.

comment:5 Changed 7 years ago by Chris Wagner <cw264701@…>

I'm sorry, but if it's between SQLAlchemy and Django, then Django is the broken component. Like I said, I think it comes down to a weakness (i.e., bug) in psycopg2, but if we set aside that fact, then Django is to blame, because Django is the component that is doing the assuming.

You see, by setting the psycopg2 "UNICODE mode" (i.e., calling psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)), Django is changing the environment. As far as I know, SQLAlchemy is not changing the environment. Therefore, SQLAlchemy will not be dirtying the waters for other components (other libraries, extensions, applications, etc.).

BTW, SQLAlchemy actually does check the result that is returned from a database backend. I've looked at the code, and I've seen that it actually has a check similar to this, when retrieving a result value from the database:

  if not isinstance(result_value, unicode):
    return result_value.decode(dialect.encoding)

The UnicodeDecodeError is actually raised upon a call to psycopg2's cursor.fetchall().

I think I may have caused a little confusion in my original post. I think you may be under the impression that SQLAlchemy should be setting the connection encoding. Perhaps it could, but SQLAlchemy is being the more conservative player, as it is not making changes that affect things outside its scope (AFAIK). I would agree, however, that SQLAlchemy might be able to make itself a bit more resilient to psycopg2's settings.

Whether or not this bug gets fixed is not incredibly important to me. Like I said before, I'm more concerned about others stumbling across the same issue, which is very difficult to diagnose. I am trying to migrate away from Django (or at least the database-enabled portions). I'd just like to do my part in the free software community.

comment:6 Changed 7 years ago by ubernostrum

The argument you're making, essentially, is "this third-party library has a useful, document, publicly-expoed feature, but nobody's allowed to use it because SQLAlchemy has a bug". That is not a Django problem. That is either SQLAlchemy's problem (they should be able to compatibly use psycopg, which has a publicly-documented interface) or psycopg's problem (they should not allow features used by one application to affect another). I encourage you to lobby both of those projects to help out with this, but Django should not refuse to use a useful, documented, publicly-exposed feature of psycopg for the reasons you've given here.

comment:7 Changed 7 years ago by Chris Wagner <cw264701@…>

I still don't agree with you that this is more a problem with SQLAlchemy than Django; worst case, it's equally the fault of SQLAlchemy and Django. But, I don't think we're going to get anywhere arguing that point, so I've posted a report in SQLAlchemy's bug tracker, referencing this report: http://www.sqlalchemy.org/trac/ticket/783 . Let's see what they say..

FYI, this is no longer a problem for me, personally, as I've cut Django off from the database completely, but, like I said, it'd be good to save others the grief...

comment:8 Changed 7 years ago by jacob

FWIW, connection.set_client_encoding('utf8') is syntaxtic sugar for executing the SQL statement SET client_encoding = 'utf8'. So you should be able to just run that SQL directly against SQLAlchemy (if it's preventing you from running that particular statement, then that really is a bug).

I'm not sure weather Django ought to change anything here... I'll have to think about the ramifications a bit more.

comment:9 Changed 7 years ago by anonymous

  • Cc sam@… added

comment:10 Changed 6 years ago by mtredinnick

  • Resolution set to wontfix
  • Status changed from new to closed

The SQLAlchemy ticket referenced in comment:7 has a workaround for SQLAlchemy and some strong agreement that this is a psycopg2 bug. Closing as wontfix for the reasons given in previous comments.

(Jacob, reopen if you have any really good ideas on a workaround, but I don't see this as our problem at the moment; good citizenship works both and SQLAlchemy have a workaround for the problem on their side.)

comment:11 Changed 4 years ago by mhart

My apologies for raising a zombie of a bug, this issue has recently bitten us and I was hoping it could be revisited since psycopg2's register_type function can register a type globally (as has been done earlier) or at the connection or cursor level by passing a second argument for scope. Perhaps this is a change to psycopg2 since this bug was closed.

http://initd.org/psycopg/docs/extensions.html#database-types-casting-functions

"""psycopg2.extensions.register_type(obj[, scope])

Register a type caster created using new_type().

If scope is specified, it should be a connection or a cursor: the type caster will be effective only limited to the specified object. Otherwise it will be globally registered."""

I believe this would appease all parties concerned.

comment:12 Changed 4 years ago by mhart

  • Resolution wontfix deleted
  • Status changed from closed to reopened

comment:13 Changed 4 years ago by mhart

  • Resolution set to wontfix
  • Status changed from reopened to closed

My apologies, until now I had not seen bug:13161 which proposes this exact change. Reverting to earlier state.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.