Code

Opened 9 years ago

Closed 9 years ago

#307 closed defect (invalid)

Use unicode strings u"bla-bla" in SQL-queries for compatibility with national languages

Reported by: mordaha@… Owned by: adrian
Component: Metasystem Version:
Severity: trivial Keywords: unicode strings in sql queries
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Use unicode string in SQL-queries for compatibility with national languages (when you pass SQL-query as python unicode - database backend (MySQLdb) authomaticaly converts it from python encoding to mysql-connection encoding)

I found it in meta/fields.py (may be in some other places):

def get_db_prep_lookup(self, lookup_type, value):
        ...skip...

        elif lookup_type in ('contains', 'icontains'):
            return ["%%%s%%" % prep_for_like_query(value)]
            # above string must be:
            # return [u"%%%s%%" % prep_for_like_query(value)] # using unicode
        elif lookup_type == 'iexact':

without that u queries like field_contains=unicode_string_with_national_characters will returns nothing

Attachments (0)

Change History (3)

comment:1 Changed 9 years ago by hugo <gb@…>

Hey, say hello to a can of worms :-)

The problem isn't really solved by just passing in unicode strings - actually it highly depends on the backend and the server setting on what will happen (and on the DBAPI implementation used). And you can't just do u"" string interpolation - stuff within django is allways bytestrings encoded in utf-8, so to get the unicode version of data you would have to use pre_for_like_query(value).decode('utf-8').

BTW: the mysql never sees any direct unicode stuff, it only sees utf-8 encoded strings - so if we pass u"" strings to the mysql driver, the driver code re-encodes those as utf-8 and passes that along to your database. And hopefully your database is running in utf-8 charset, because otherwise it might break on any char that's not in your home encoding.

PostgreSQL has something similar: with set clientencoding we could tell the database that we have all our client stuff encoded in utf-8 and then the database should convert into the native database encoding. With sqlite it's different: it allways stores utf-8 strings and returns u"" strings with the python DBAPI implementation. Except if it doesn't - for example if you hook up converters/transformations, because those will receive and send utf-8 encoded bytestrings and not unicode strings.

Maybe the right way would be to go for utf-8 client encoding in the database drivers and to make sure that we allways pass them utf-8 strings (or unicode strings if the driver accepts that). But then we would have to require the users to set up their databases with utf-8 encoding, because otherwise they will sooner or later get unicode encoding/decoding errors in the database connection.

comment:2 Changed 9 years ago by adrian

  • priority changed from high to normal
  • Severity changed from critical to normal

comment:3 Changed 9 years ago by anonymous

  • priority changed from normal to lowest
  • Resolution set to invalid
  • Severity changed from normal to trivial
  • Status changed from new to closed

Ok, i always will use .encode('utf8')

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.