Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#2635 closed enhancement (fixed)

[patch] Updates for MySQL backend

Reported by: Andy Dustman <farcepest@…> Owned by: mtredinnick
Component: Database layer (models, ORM) Version: master
Severity: normal Keywords:
Cc: farcepest@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

The patch (to be attached) is fairly heavily commented, and I intend to post a reference to this bug on django-users and encourage people to test it.

Changes

  • requires MySQLdb-1.2.1 and newer.
  • eliminates MysqlDebugWrapper in favor of the standard util.CursorDebugWrapper and using warning filtering to raise warnings as exceptions.
  • sets character set to UTF8 in the the connection initiation
  • returns non-binary character fields as unicode
  • sets the server SQL mode
    • ANSI makes it more SQL standards-compliant, including use of double-quotes
    • TRADITIONAL mode turns warnings about bad values into errors

I've tested it with my own application and it seems to work fine. One gotcha I ran into that I actually get updates from an
external database, and the query string for that database needs strings and not unicodes, and I since I was now getting
unicode back from MySQL, I was trying to pass unicode to it, but that was easily fixed. Everything else in Django seems
to work fine for me with unicode.

Concerns

  • unicode values still might be breaking things; in this case, use_unicode=False should be set (charset implies use_unicode=True) as a workaround, but the plan is to make Django unicode-safe anyway, as I understand it.
  • warning filtering may be redundant with TRADITIONAL mode, but not harmful.
  • sql_mode is being set for ANSI,TRADITIONAL. The last options is new to MySQL-5.0, and this may cause errors for MySQL-4.1, or it might be ignored; I have only had time to test on MySQL-5.0. ANSI works going back to 4.1. ANSI_QUOTES is recognized by MySQL-4.0. If need be I can add some version tests.

The real question is, what is the earliest version of MySQL you want to support? I think either 4.1 or 5.0 would be a reasonable
cutoff, as the current stable version is 5.0, and 5.1 is in late beta, and 4.1 adds the good character set support.
Originally when I did this, I copied the mysql backend and called it mysql5, but you may not want to have multiple
backends (though there is a precedence for that). Ultimately I think it needs to be left to the end-users to decide
what the minimum version should be, since the Django developers prefer PostgreSQL.

Attachments (4)

django-mysql5.patch (4.7 KB) - added by Andy Dustman <farcepest@…> 9 years ago.
Patch for improved MySQL support
django-mysql.patch (5.8 KB) - added by Andy Dustman <farcepest@…> 8 years ago.
Updated MySQL support
mysql.txt (5.4 KB) - added by Andy Dustman <farcepest@…> 8 years ago.
Some notes on using MySQL with Django. (disregard the previous HTML file, this is the original)
django-mysql.2.patch (6.0 KB) - added by Andy Dustman <farcepest@…> 8 years ago.
Found a small bug in the last patch dealing with Django's overloading of DATABASE_HOST

Download all attachments as: .zip

Change History (26)

Changed 9 years ago by Andy Dustman <farcepest@…>

Patch for improved MySQL support

comment:1 Changed 9 years ago by Andy Dustman <farcepest@…>

Discuss this bug on Google Groups

comment:2 Changed 9 years ago by mtredinnick

  • Owner changed from adrian to mtredinnick

A couple of immediate concerns I have here:

  • the unconditional use of utf-8 to communicate with the database would seem to imply that the database must be in utf-8 collation mode. Is that wise? How about legacy databases, are they going to any worse off than they are at the moment?
  • I'm not super-comfortable with requiring MySQLdb-1.2.1, since that was only released in April this year, so it's less than 6 months old and won't be widely used yet.

In general, I think the patch does some useful things. Using the ability to be somewhat more standards compliant and better unicode support in MySQL 5.0 is nice. However, we cannot ignore 4.x releases, since, as mentioned in the thread, hosting companies still supply it and I personally know of a number of corporate production environments where 4.x is still the primary system in use. Need to think about whether we can keep a "one size fits all (with version checks)" mysql backend or if we want to bite the bullet and move to a mysql4 and mysql split set of backends.

I know I'm going to regret this, but I'll keep an eye on this patch and help work towards integrating it slowly.

What would make things easier would be if the patch could be split into a few pieces along the following lines (or at least knowing which splits aren't possible).

  1. Changes that work with MySQL 4.0 (I realise you don't have a test system for this. A good guess would still help, though).
  2. Changes that require MySQL 4.1
  3. Changes that require MySQL 5.x
  4. Changes that require MySQL-1.2.1 (is this all of it, or just some features?)

I think we'll need unicode strings off for now, until more of the internals are converted. But having it available at the change of more or less a single line like you have now is great. That will certainly help in the not too distant future.

comment:3 Changed 9 years ago by Andy Dustman <farcepest@…>

The current trunk version already sets the connection character set to UTF-8. See http://code.djangoproject.com/browser/django/trunk/django/db/backends/mysql/base.py#L93 This does *not* imply that the database itself is in UTF-8. If a different character set is in use for the database/table/column, the server transcodes from the connection character set into the correct one. The difference between the way the trunk version does it and the patched version is, the trunk version sets it with an SQL statement, and the patched version uses an API call. Additionally there are some internal things that have to happen for the correct conversion to take place, so doing it with SQL doesn't necessarily work the way you'd expect.

I'll try splitting up the patch sometime next week. My original intention was to use this as a different backend (mysql5) so that we'd have backwards-compatibility for those who need it.

Also, I've been developing an app with this patch and I haven't had to do anything special with Django yet even with all text-like columns coming back in Unicode. That said, there are no bad problems with the trunk version.

comment:4 Changed 9 years ago by dummy@…

I tried your patch for django and mysql5 today. It has the same problem as it has with 'SET NAMES utf8'.

If I change two lines of your code, my problem is solved in the same way as I did it with the patch above:
'use_unicode': False,
'charset': 'latin1',

I would suggested configuring the DATABASE_ENCODING for mysql backend.

comment:5 Changed 9 years ago by dummy@…

Sorry, posted to the wrong thread.

comment:6 Changed 9 years ago by lakin@…

I tried commenting on the thread that was suggested, but google groups wouldn't let me. The above patch does not work with MySQL 4.1 for me. Gives an SQL error, which as far as I can tell is due to double quoting. If you need/want the actual error I can provide it. (Just email me)

I think that we need to keep support for 4.1 because many shared hosting environments still use it. See my comments on #2810

comment:7 Changed 9 years ago by Michael Radziej <mir@…>

Hi Andy,

thanks for contributing this ticket, unfortunately it has been waiting a bit long in the queue. I'm now trying to triage it, but it looks a bit "aborted". Is this still current? Do you have any comments about mysql 4.1, 4.0 and 3.23, as the preceding comment reports a problem with 4.1?

Michael

comment:8 Changed 8 years ago by Michael Radziej <mir@…>

  • Triage Stage changed from Unreviewed to Design decision needed

Hmmm ... I guess a core dev needs to look through it if they want it or not.

comment:9 Changed 8 years ago by Andy Dustman

  • Patch needs improvement set

I need to review it some more, too. I would like to get support in for read_default_file. I am at PyCon now and staying for Sprints and expect to be working.on this.

Changed 8 years ago by Andy Dustman <farcepest@…>

Updated MySQL support

comment:10 Changed 8 years ago by Andy Dustman <farcepest@…>

  • Patch needs improvement unset

What's new with this patch:

  • Adds support for DATABASE_OPTIONS setting.
  • Doesn't try to change the character set.
  • Doesn't try to set the sql_mode.
  • Doesn't try to return data as unicode

With the addition of DATABASE_OPTIONS support, you are able to set charset, sql_mode, and use_unicode in the settings file. You can also use read_default_file to set connection options.

I've tested this with MySQL-5.0 but there's nothing here that should break on earlier versions. It passes the Django unit tests, but beware: Settings in DATABASE_OPTIONS override those in DATABASE_*, so the tests will use the database specified by DATABASE_OPTIONS, and not the the test database it creates (django_test_db). This is probably true for any backend that uses DATABASE_OPTIONS.

If you do set charset, you may want to set use_unicode=False. use_unicode is set to True if you set charset. Currently the unit tests for models do not pass if you have use_unicode=True, but they all look like comparison tests.

I am also working on some MySQL notes for Django.

You could probably remove the test for MySQLdb-1.2.1 or newer.

Changed 8 years ago by Andy Dustman <farcepest@…>

Some notes on using MySQL with Django. (disregard the previous HTML file, this is the original)

Changed 8 years ago by Andy Dustman <farcepest@…>

Found a small bug in the last patch dealing with Django's overloading of DATABASE_HOST

comment:11 Changed 8 years ago by mtredinnick

  • Triage Stage changed from Design decision needed to Ready for checkin

Great work, sir. This last patch (and the mysql.txt file) looks like something we should use. I've read the code once and it looks right from a "fits in with Django" perspective. I'm deferring to your experience on the MySQL front. I want to test it out on an older machine that I'll have access to tomorrow just to see that we haven't hosed bakcwards compatibility. Given the features this adds and the long break that will exist between 0.96 and 1.0, if it doesn't break backwards compat in any obvious way, I think it should go in before 0.96, since it will help with a lot of peoples' issues.

In the short-term we should fix the testing framework to look inside DATABASE_OPTIONS and force it to point to django_test_db, but that's probably not a showstopper for these changes, since it's possible to work around it in any case.

Thanks a lot for persisting with this and coming up with a non-intrusive solution.

comment:12 Changed 8 years ago by mtredinnick

Commit pending on resolution of this thread on django-dev.

comment:13 Changed 8 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [4724]) Fixed #2635 -- Added improved MySQL backend support from Andy Dustman. Also
added database.txt documentation; currently only describing Django-related
features of MySQL versions.

As of this commit, there are four known test failures due to (a) no transaction
support with MyISAM storage engine and (b) MySQLdb automatically creating
decimal types for Django's "float" field.

comment:14 follow-up: Changed 8 years ago by derelm

  • Resolution fixed deleted
  • Status changed from closed to reopened

i just upgraded to a revision > 4724 -- that broke my text encodings coming from mysql. is this a backwards incompatible change (which i failed to notice) or should everything be working as before?

in the latter case something is definately wrong :)

comment:15 Changed 8 years ago by derelm

I guess i forgot to mention that my MySQL Version is 4.1.11 (current Debian Sarge).

Also i had a quick look at http://code.djangoproject.com/wiki/BackwardsIncompatibleChanges but i did not find a notice that would suggest that i have to change something.

So this either needs Documentation or the patch needs improvement.

comment:16 Changed 8 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from reopened to closed

It should be backwards compatible, so please provide some more information about what setup you have (database encoding, any special settings.py options, etc) and precisely what is going wrong ("it broke" could mean anything).

Please also open this issue as a new ticket: if everybody having some problem with the MySQL upgrade reopens this one, it will end up containing a multitude of different issues, rather than just the upgrade patch.

comment:17 in reply to: ↑ 14 ; follow-up: Changed 8 years ago by Andy Dustman <farcepest@…>

Replying to derelm:

i just upgraded to a revision > 4724 -- that broke my text encodings coming from mysql. is this a backwards incompatible change (which i failed to notice) or should everything be working as before?

in the latter case something is definately wrong :)

Try putting in your settings:

DATABASE_OPTIONS = dict(charset="utf8")

and see if that helps.

comment:18 follow-up: Changed 8 years ago by derelm

that didn't help.

but i noticed that the version check doesn't work as expected (at least not with Debian Sarges MySQLdb version 1.2.1g2). see #3747

and seeing that that version isn't supported at all, my complaint about the umlauts not working is invalid.

so i will need to stick with r4723.

comment:19 Changed 8 years ago by anonymous

There shouldn't be a huge difference at the "character encoding level" between 1.2.1g2 and 1.2.1-final, so if you wanted to post some details to the django-users list, we can probably try to work out what's happening here.

comment:20 in reply to: ↑ 17 Changed 8 years ago by anonymous

Replying to Andy Dustman <farcepest@gmail.com>:

I haev the same problem

Try putting in your settings:

DATABASE_OPTIONS = dict(charset="utf8")

and see if that helps.

and this helps. But I am very angry, that such a change in config is needed. Also django uses UTF-8 from beginning and the UTF-8 should be default for every install.

comment:21 in reply to: ↑ 18 Changed 8 years ago by beegee

Replying to derelm:

that didn't help.

but i noticed that the version check doesn't work as expected (at least not with Debian Sarges MySQLdb version 1.2.1g2). see #3747

and seeing that that version isn't supported at all, my complaint about the umlauts not working is invalid.

so i will need to stick with r4723.

Revision 4724 is giving me headaches for several hours now. My dev environment has no problem at all with rev 4724. But, in my production environment utf8 characters are not handled correctly. It leads to decode errors and utf8 character displayed as , instead of a nice character. My dev enviroment has >>> print MySQLdb.version_info = (1, 2, 1, 'final', 2) and my production enviroment has >>> print MySQLdb.version_info = (1, 2, 1, 'gamma', 3) (Debian). I will try upgrading MySQLdb in my production environment.

comment:22 Changed 8 years ago by Michael Radziej <mir@…>

Please report any further observations regarding character encoding and mysql in #3754.

Note: See TracTickets for help on using tickets.
Back to Top