Opened 18 years ago

Closed 17 years ago

Last modified 13 years ago

#3454 closed (fixed)

sqlite backend is using row_factory when it should be using text_factory

Reported by: (removed) Owned by: Adrian Holovaty
Component: Database layer (models, ORM) Version: dev
Severity: Keywords: unicode-branch
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

currently, sqlite has

def utf8rowFactory(cursor, row):
  def utf8(s):
    if type(s) == unicode: return s.encode("utf-8")
    return s
  return [utf8(r) for r in row]

for row_factory; problem here is that it's rebuilding each record regardless of whether or not the utf8 conversion is required. doing

Database.text_factory = lambda s:s.decode("utf-8")

limits the conversion to just TEXT objects.

This is a bit faster; that said, I'm wondering why the forced conversion- sqlite stores data in utf8, if

Database.text_factory = str

ware set, the whole decoding/encoding would be bypassed, and the native encoding (utf8) would be passed back.

In terms of performance, using Database.text_factory = lambda s:s.decode("utf-8") gains are dependant upon the column types; greater # of non-text fields, greater the gain.

Real gain is via turning off the encode/decode and using str directly (underlying utf8); same gain in terms of avoiding extra inspection, but avoids all the extra work.

Only downside to either change I can see is that raw sql queries would return str instead of sqlites unicode. Not really sure if this is an actual issue however (don't see any other such limitation in the backends).

Patch is attached for the encode/decode variant; unless there are good reasons, would just bypass the encoding/decoding entirely.

Attachments (1)

patch (1.1 KB ) - added by (removed) 18 years ago.
use text_factory instead of row_factory for unicode conversion

Download all attachments as: .zip

Change History (7)

by (removed), 18 years ago

Attachment: patch added

use text_factory instead of row_factory for unicode conversion

comment:1 by Simon G. <dev@…>, 18 years ago

Triage Stage: UnreviewedDesign decision needed

comment:2 by Malcolm Tredinnick, 18 years ago

Patch needs improvement: set
Triage Stage: Design decision neededAccepted

I don't think the use of decode() in this patch is correct. s.decode('utf-8') is for converting a UTF-8 encoded string into a unicode object, but I suspect you want to be converting unicode objects into UTF-8 for storage purposes (that was what we were doing previously).

We cannot bypass the encoding step entirely, because there is no guarantee at all that internal strings will be UTF-8 encoded streams of bytes (or immediately convertible to such by Python). We are going to move to unicode everywhere outside of the internal/external interfaces (which will be conversion points), but that hasn't happened yet.

Other than that, the motivation behind the patch looks like a good find. Thanks.

comment:3 by (removed), 18 years ago

Yeah, that ought to be s.encode('utf-8').

Would suggest sticking a note in the sqlite backend code that when unicode is default, to just disable these conversions since sqlite already spits back unicode.

comment:4 by Malcolm Tredinnick, 18 years ago

Keywords: unicode-branch added

This ticket has become a non-issue in the unicode branch (no converter is needed at all). Will close it when that branch is merged into trunk.

comment:5 by Malcolm Tredinnick, 17 years ago

Resolution: fixed
Status: newclosed

(In [5609]) Merged Unicode branch into trunk (r4952:5608). This should be fully
backwards compatible for all practical purposes.

Fixed #2391, #2489, #2996, #3322, #3344, #3370, #3406, #3432, #3454, #3492, #3582, #3690, #3878, #3891, #3937, #4039, #4141, #4227, #4286, #4291, #4300, #4452, #4702

comment:12 by Martin v. Löwis, 13 years ago

In [16948]:

Dummy merge.
Merged revisions 5609-5612,5614-5626,5629-5632,5636,5638-5646,5649-5654,5658-5660,5662-5700 via svnmerge from
https://code.djangoproject.com/svn/django/trunk

........

r5609 | mtredinnick | 2007-07-04 14:11:04 +0200 (Mi, 04 Jul 2007) | 5 lines


Merged Unicode branch into trunk (r4952:5608). This should be fully
backwards compatible for all practical purposes.


Fixed #2391, #2489, #2996, #3322, #3344, #3370, #3406, #3432, #3454, #3492, #3582, #3690, #3878, #3891, #3937, #4039, #4141, #4227, #4286, #4291, #4300, #4452, #4702

........

r5610 | mtredinnick | 2007-07-04 14:25:43 +0200 (Mi, 04 Jul 2007) | 3 lines


Fixed Javascript syntax from [5608] that was causing a problem in Opera. Fixed
#4365.

........

r5611 | mtredinnick | 2007-07-04 14:31:19 +0200 (Mi, 04 Jul 2007) | 3 lines


Fixed #4766 -- Added Russian support to Javascript slug creation. Thanks,
boobsd@….

........

r5612 | mtredinnick | 2007-07-04 14:48:12 +0200 (Mi, 04 Jul 2007) | 2 lines


Fixed some ReST errors.

........

r5614 | mtredinnick | 2007-07-05 03:25:05 +0200 (Do, 05 Jul 2007) | 3 lines


Form encoding should be changed only via HttpRequest, not on GET and POST
directly.

........

r5615 | mtredinnick | 2007-07-05 05:25:11 +0200 (Do, 05 Jul 2007) | 2 lines


Fixed #4717 -- Updated Catalan translation. Thanks, marc.garcia@….

........

r5616 | mtredinnick | 2007-07-05 05:29:18 +0200 (Do, 05 Jul 2007) | 3 lines


Fixed #4753 -- Updated Spanish translation. Also move translators' names out of
PO file and into AUTHORS.

........

r5617 | mtredinnick | 2007-07-05 12:27:22 +0200 (Do, 05 Jul 2007) | 3 lines


Added a test that shows the problem in #4470. This fails only for the mysql_old
backend. Refs #4470.

........

r5618 | mtredinnick | 2007-07-05 13:08:40 +0200 (Do, 05 Jul 2007) | 3 lines


Added CACHE_MIDDLEWARE_SECONDS to global settings and documentation (it's
used by the cache middleware). Refs #1015.

........

r5619 | mtredinnick | 2007-07-05 13:10:27 +0200 (Do, 05 Jul 2007) | 5 lines


Fixed #1015 -- Fixed decorator_from_middleware to return a real decorator even
when arguments are given. This looks a bit ugly, but it's fully backwards
compatible and all the extra work is done at import time, so it shouldn't have
any real performance impact.

........

r5620 | russellm | 2007-07-05 14:54:42 +0200 (Do, 05 Jul 2007) | 2 lines


Fixed minor typo in assertion message.

........

r5621 | gwilson | 2007-07-06 06:04:42 +0200 (Fr, 06 Jul 2007) | 2 lines


Fixed #4779 -- Fixed a couple typos in the test_client_regress tests that surfaced when typo was corrected in [5620]. Thanks ferringb@….

........

r5622 | mtredinnick | 2007-07-06 08:53:27 +0200 (Fr, 06 Jul 2007) | 2 lines


Fixed #4781 -- Typo fix. Pointed out by Simon Litchfield.

........

r5623 | mtredinnick | 2007-07-06 10:04:04 +0200 (Fr, 06 Jul 2007) | 4 lines


Fixed #4770 -- Fixed some Unicode conversion problems in the mysql_old backend
with old MySQLdb versions. Tested against 1.2.0, 1.2.1 and 1.2.1p2 with only
expected failures.

........

r5624 | mtredinnick | 2007-07-06 10:35:25 +0200 (Fr, 06 Jul 2007) | 3 lines


Fixed #4782 -- Updated Slovenian translation. Thanks, Gasper Koren. Also moved
contributor names into AUTHORS file.

........

r5625 | mtredinnick | 2007-07-06 12:21:14 +0200 (Fr, 06 Jul 2007) | 4 lines


Fixed #4776 -- Fixed a problem with handling of upload_to attributes. The new
solution still works with non-ASCII filenames. Based on a patch from
mike.j.thompson@….

........

r5626 | russellm | 2007-07-07 04:16:23 +0200 (Sa, 07 Jul 2007) | 2 lines


Added some uncredited authors that worked on the Oracle branch.

........

r5629 | mtredinnick | 2007-07-07 19:15:54 +0200 (Sa, 07 Jul 2007) | 8 lines


Changed HttpRequest.path to be a Unicode object. It has already been
URL-decoded by the time we see it anyway, so keeping it as a UTF-8 bytestring
was causing unnecessary problems.


Also added handling for non-ASCII URL fragments in feed creation (the portion
that was outside the control of the Feed class was messed up).

........

r5630 | mtredinnick | 2007-07-07 20:24:27 +0200 (Sa, 07 Jul 2007) | 4 lines


Fixed #4772 -- Fixed reverse URL creation to work with non-ASCII arguments.
Also included a test for non-ASCII strings in URL patterns, although that
already worked correctly.

........

r5631 | mtredinnick | 2007-07-07 20:39:23 +0200 (Sa, 07 Jul 2007) | 3 lines


Corrected misleading comment from [5619]. Not sure what I was smoking at the
time.

........

r5632 | mtredinnick | 2007-07-08 02:39:32 +0200 (So, 08 Jul 2007) | 5 lines



Fixed reverse URL lookup using functions when the original URL pattern was a
string. This is now just as fragile as it was prior to [5609], but works in a
few cases that people were relying on, apparently.

........

r5636 | mtredinnick | 2007-07-08 13:22:53 +0200 (So, 08 Jul 2007) | 4 lines


Fixed #4798-- Made sure that function keyword arguments are strings (for the
keywords themselves) when using Unicode URL patterns.

........

r5638 | gwilson | 2007-07-10 04:34:42 +0200 (Di, 10 Jul 2007) | 2 lines


Fixed #4817 -- Removed leading forward slashes from some urlconf examples in the documentation.

........

r5639 | gwilson | 2007-07-10 04:45:11 +0200 (Di, 10 Jul 2007) | 2 lines


Fixed #4814 -- Fixed some whitespace issues in tutorial01, thanks John Shaffer.

........

r5640 | gwilson | 2007-07-10 05:26:26 +0200 (Di, 10 Jul 2007) | 2 lines


Fixed #4812 -- Fixed an octal escape in regular expression that is used in the isValidEmail validator, thanks batchman@….

........

r5641 | mtredinnick | 2007-07-10 14:02:06 +0200 (Di, 10 Jul 2007) | 3 lines


Fixed #4823 -- Fixed a Python 2.3 incompatibility from [5636] (it was even
demonstrated by existing tests, so I really screwed this up).

........

r5642 | mtredinnick | 2007-07-10 14:03:36 +0200 (Di, 10 Jul 2007) | 3 lines


Fixed #4804 -- Fixed a problem when validating choice lists with non-ASCII
data. Thanks, django@….

........

r5643 | mtredinnick | 2007-07-10 14:33:55 +0200 (Di, 10 Jul 2007) | 4 lines


Fixed #3760 -- Added the ability to manually set feed- and item-level id
elements in Atom feeds. This is fully backwards compatible. Based on a patch
from spark343@….

........

r5644 | mtredinnick | 2007-07-11 08:55:12 +0200 (Mi, 11 Jul 2007) | 3 lines


Fixed #4815 -- Fixed decoding of request parameters when the input encoding is
not UTF-8. Thanks, Jordan Dimov.

........

r5645 | mtredinnick | 2007-07-11 09:00:27 +0200 (Mi, 11 Jul 2007) | 3 lines


Fixed #4802 -- Updated French translation. Combined contribution from
baptiste.goupil@… and rocherl@….

........

r5646 | mtredinnick | 2007-07-11 09:12:50 +0200 (Mi, 11 Jul 2007) | 2 lines


Fixed #4753 -- Small update to Spanish translation from Mario Gonzalez.

........

r5649 | jacob | 2007-07-12 02:33:44 +0200 (Do, 12 Jul 2007) | 1 line


Fixed #4615: corrected reverse URL resolution examples in tutorial 4. Thanks for the patch, simeonf.

........

r5650 | adrian | 2007-07-12 06:43:29 +0200 (Do, 12 Jul 2007) | 1 line


Added 'New in Django development version' note to docs/syndication_feeds.txt changes from [5643]

........

r5651 | adrian | 2007-07-12 06:44:45 +0200 (Do, 12 Jul 2007) | 1 line


Edited changes to docs/tutorial04.txt from [5649]

........

r5652 | adrian | 2007-07-12 07:23:47 +0200 (Do, 12 Jul 2007) | 1 line


Added helpful error message to SiteManager.get_current() if the user hasn't set SITE_ID

........

r5653 | adrian | 2007-07-12 07:28:04 +0200 (Do, 12 Jul 2007) | 1 line


Added RequestSite class to sites framework

........

r5654 | adrian | 2007-07-12 07:29:32 +0200 (Do, 12 Jul 2007) | 1 line


Improved syndication feed framework to use RequestSite if the sites framework is not installed -- i.e., the sites framework is no longer required to use the syndication feed framework. This is backwards incompatible if anybody has subclassed Feed and overridden init(), because the second parameter is now expected to be an HttpRequest object instead of request.path

........

r5658 | russellm | 2007-07-12 09:45:35 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4459 -- Added 'raw' argument to save method, to override any pre-save processing, and modified serializers to use a raw-save. This enables serialization of DateFields with auto_now/auto_now_add. Also modified serializers to invoke save() directly on the model baseclass, to avoid any (potentially order-dependent, data modifying) behavior in a custom save() method.

........

r5659 | russellm | 2007-07-12 13:24:16 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #3770 -- Remove null=True tag from OneToOne serialization test. OneToOne fields can't have a value of null.

........

r5660 | russellm | 2007-07-12 13:27:38 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #3768 -- Disabled NullBooleanField PK serialization test. We can't and don't test null PK values.

........

r5662 | russellm | 2007-07-12 14:33:24 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4837 -- Updated Debian packaging details. Thanks for the suggestion, Yasushi Masuda <whosaysni@…>.

........

r5663 | russellm | 2007-07-12 14:44:05 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4808 -- Added Chilean regions in localflavor. Thanks, Marijn Vriens <marijn@…>.

........

r5664 | russellm | 2007-07-12 14:48:27 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4745 -- Updated docs to point out that 0 is not a valid SITE_ID when running the tests. Thanks for the suggestion, Lars Stavholm <stava@…>.

........

r5665 | russellm | 2007-07-12 14:50:02 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4763 -- Minor typo in cache documentations. Thanks, dan@….

........

r5666 | russellm | 2007-07-12 14:55:28 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4627 -- Added details on MacPorts packaging of Django. Thanks, Paul Bissex.

........

r5667 | russellm | 2007-07-12 15:23:11 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4640 -- Fixed import to stringfilter in docs. Proposed solution to move stringfilter into django.template.init introduces a circular import problem.

........

r5668 | russellm | 2007-07-12 15:32:00 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4722 -- Clarified discussion about PYTHONPATH in modpython docs. Thanks for the suggestion, Collin Grady <cgrady@…>.

........

r5669 | russellm | 2007-07-12 15:37:59 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4755 -- Modified newforms MultipleChoiceField to use list comprehension, rather than iteration.

........

r5670 | russellm | 2007-07-12 15:41:27 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4764 -- Added reference to Locale middleware in middleware docs. Thanks, dan@….

........

r5671 | russellm | 2007-07-12 15:55:19 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4768 -- Converted timesince and dateformat to use explicit floor division (pre-emptive avoidance of Python 3000 compatibility problem), and removed a redundant millisecond check. Thanks, John Shaffer <jshaffer2112@…>.

........

r5672 | russellm | 2007-07-12 16:00:13 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4775 -- Added some missing Hungarian accents to the urlify.js LATIN_MAP. Thanks, Pistahh <szekeres@…>.

........

r5673 | russellm | 2007-07-12 16:05:16 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4502 -- Clarified reference to view in tutorial. Thanks for the suggestion, Carl Karsten <carl@…>.

........

r5674 | russellm | 2007-07-12 16:11:41 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4522 -- Clarified the allowed filter arguments on the time and date filters. Thanks for the suggestion, admackin@….

........

r5675 | russellm | 2007-07-12 16:21:51 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4525 -- Fixed mistaken documentation on arguments to runfcgi. Thanks, Johan Bergstrom <bugs@…>.

........

r5676 | russellm | 2007-07-12 16:41:32 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4538 -- Split the installation instructions to differentiate between installing a distribution package and installing an official release. Thanks to Carl Karsten for the idea, and Paul Bissex for the patch.

........

r5677 | russellm | 2007-07-12 17:26:37 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4526 -- Modified the test Client login method to fail when a user is inactive. Thanks, marcin@….

........

r5678 | russellm | 2007-07-13 07:03:33 +0200 (Fr, 13 Jul 2007) | 2 lines


Fixed #3505 -- Added handling for the error raised when the user forgets the comma in a single element tuple when defining AUTHENTICATION_BACKENDS. Thanks for the help identifying this problem, Mario Gonzalez <gonzalemario@…>.

........

r5679 | mtredinnick | 2007-07-13 10:52:07 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #2591 -- Fixed a problem with inspectdb with psycopg2 (only). Patch from
Gary Wilson.

........

r5680 | mtredinnick | 2007-07-13 11:09:59 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4807 -- Fixed a couple of corner cases in decimal form input validation.
Based on a suggestion from Chriss Moffit.

........

r5681 | mtredinnick | 2007-07-13 11:14:51 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4839 -- Added repr methods to URL classes that show the pattern they
contain. Thanks, Thomas Güttler.

........

r5682 | mtredinnick | 2007-07-13 12:56:30 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4842 -- Added slightly more robust error reporting. Thanks, Thomas
Güttler.

........

r5683 | mtredinnick | 2007-07-13 13:05:01 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4846 -- Fixed some Python 2.3 encoding problems in the admin interface.
Based on a patch from daybreaker12@….

........

r5684 | mtredinnick | 2007-07-13 14:03:20 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4861 -- Removed some duplicated logic from the newforms RegexField by
making it a subclass of CharField. Thanks, Collin Grady.

........

r5685 | mtredinnick | 2007-07-13 15:15:35 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4865 -- Replaced a stray generator comprehension with a list
comprehension so that we don't break Python 2.3.

........

r5686 | mtredinnick | 2007-07-13 16:13:35 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4469 -- Added slightly more informative error messages to max- and
min-length newform validation. Based on a patch from A. Murat Eren.

........

r5687 | mtredinnick | 2007-07-13 16:14:47 +0200 (Fr, 13 Jul 2007) | 2 lines


Added author credit for [5686]. Refs #4469.

........

r5688 | mtredinnick | 2007-07-13 16:33:46 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4484 -- Fixed APPEND_SLASH handling to handle an empty path value.
Thanks, VesselinK.

........

r5689 | mtredinnick | 2007-07-13 16:40:39 +0200 (Fr, 13 Jul 2007) | 2 lines


Fixed #4556 -- Stylistic changes to [5500]. Thanks, glin@….

........

r5690 | gwilson | 2007-07-13 22:36:01 +0200 (Fr, 13 Jul 2007) | 2 lines


Refs #2591 -- Removed int conversion and try/except since the value in the single-item list is already an int. I overlooked this in my original patch, which was applied in [5679].

........

r5691 | adrian | 2007-07-13 23:20:07 +0200 (Fr, 13 Jul 2007) | 1 line


Documented the 'commit' argument to save() methods on forms created via form_for_model() or form_for_instance()

........

r5692 | mtredinnick | 2007-07-14 07:27:22 +0200 (Sa, 14 Jul 2007) | 3 lines


Fixed #4869 -- Added a note that syncdb does not alter existing tables. Thanks,
James Bennett.

........

r5693 | mtredinnick | 2007-07-14 14:48:24 +0200 (Sa, 14 Jul 2007) | 3 lines


Fixed #4863 -- Removed comment references to a no-longer present link. Pointed
out by Thomas Güttler.

........

r5694 | mtredinnick | 2007-07-14 15:14:28 +0200 (Sa, 14 Jul 2007) | 2 lines


Fixed #4862 -- Fixed invalid Javascript creation in popup windows in admin.

........

r5695 | mtredinnick | 2007-07-14 15:39:41 +0200 (Sa, 14 Jul 2007) | 2 lines


Fixed a problem with translatable strings from [5686].

........

r5696 | mtredinnick | 2007-07-14 16:47:14 +0200 (Sa, 14 Jul 2007) | 3 lines


Fixed #4731 -- Changed management.setup_environ() so that it no longer assumes
the settings module is called "settings". Patch from SmileyChris.

........

r5697 | mtredinnick | 2007-07-14 16:50:35 +0200 (Sa, 14 Jul 2007) | 3 lines


Fixed #4870 -- Removed unneeded import and fixed a docstring in an example.
Thanks, Collin Grady.

........

r5698 | adrian | 2007-07-14 18:58:54 +0200 (Sa, 14 Jul 2007) | 1 line


Edited docs/db-api.txt changes from [5658]

........

r5699 | adrian | 2007-07-14 19:04:30 +0200 (Sa, 14 Jul 2007) | 1 line


Negligible capitalization fix in test/client.py docstring

........

r5700 | russellm | 2007-07-15 06:41:59 +0200 (So, 15 Jul 2007) | 2 lines


Clarified the documentation on the steps that happen during a save, and how raw save affects those steps.

........

Note: See TracTickets for help on using tickets.
Back to Top