currently, sqlite has
def utf8rowFactory(cursor, row):
def utf8(s):
if type(s) == unicode: return s.encode("utf-8")
return s
return [utf8(r) for r in row]
for row_factory; problem here is that it's rebuilding each record regardless of whether or not the utf8 conversion is required. doing
Database.text_factory = lambda s:s.decode("utf-8")
limits the conversion to just TEXT objects.
This is a bit faster; that said, I'm wondering why the forced conversion- sqlite stores data in utf8, if
Database.text_factory = str
ware set, the whole decoding/encoding would be bypassed, and the native encoding (utf8) would be passed back.
In terms of performance, using Database.text_factory = lambda s:s.decode("utf-8") gains are dependant upon the column types; greater # of non-text fields, greater the gain.
Real gain is via turning off the encode/decode and using str directly (underlying utf8); same gain in terms of avoiding extra inspection, but avoids all the extra work.
Only downside to either change I can see is that raw sql queries would return str instead of sqlites unicode. Not really sure if this is an actual issue however (don't see any other such limitation in the backends).
Patch is attached for the encode/decode variant; unless there are good reasons, would just bypass the encoding/decoding entirely.