Django

Code

Ticket #952 (closed: wontfix)

Opened 3 years ago

Last modified 1 year ago

[patch] Allow for database client encoding configuration from project settings

Reported by: me@julik.nl Assigned to: adrian
Milestone: normal Component: Database wrapper
Version: SVN Keywords:
Cc: Maniac@SoftwareManiacs.Org., jm.bugtracking@gmail.com Triage Stage: Design decision needed
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 1

Description

This allows the user to define the client encoding of his database and get it automatically used throughout Django. Includes Postgres and MySQL implementations.

Attachments

encoding.diff (2.6 kB) - added by me@julik.nl on 11/28/05 07:35:32.
Necessary changes
encoding.2.diff (3.1 kB) - added by me@julik.nl on 11/28/05 07:38:50.
necessary changes
database_client_charset.diff (5.8 kB) - added by me@julik.nl on 11/30/05 20:02:11.
Automagic added
corrected.diff (5.8 kB) - added by me@julik.nl on 11/30/05 22:14:56.
Why I never get it right the first time I wonder

Change History

11/28/05 07:35:32 changed by me@julik.nl

  • attachment encoding.diff added.

Necessary changes

11/28/05 07:38:50 changed by me@julik.nl

  • attachment encoding.2.diff added.

necessary changes

11/28/05 14:20:30 changed by me@julik.nl

The second diff fixes a typo in postgres driver. This supercedes and closes #814

11/28/05 14:33:12 changed by hugo

please keep in mind that the DEFAULT_CHARSET can be changed, so it would be better to not introduce a new setting DATABASE_ENCODING, but just to use the DEFAULT_ENCODING and put some hook into the backends that will do database specific deeds with that (the postgresql backend can then change 'utf-8' to 'unicode').

11/28/05 15:47:17 changed by me@julik.nl

This would require storing a table of encoding names for every driver. Besides this configuration setting is useful. For example - you might be using charset A for your database - maybe for legacy reasons - and charset B for the middleware and the frontend.

I thought that it would be both simpler and more flexible to allow the developer to set up his database settings separately (most ASCII-speaking developers won't need it anyway). But reusing DEFAULT_CHARSET is neat.

11/28/05 15:58:39 changed by me@julik.nl

Besides creating the translation tables per driver would require parsing http://www.iana.org/assignments/character-sets for every possible character set name and alias, finding out all of the mappings for these names within PostgreSQL and MySQL and storing these tables in Django sources. These tables would probably exceed the line count of the drivers themselves. Is this really necessary?

11/28/05 16:04:45 changed by hugo

you can't use different encodings for Django and your database backend, because Django never sends Unicode strings but allways sends bytestrings encoded in DEFAULT_ENCODING and so the database driver can't know what to do with it besides passing it along. So the DEFAULT_ENCODING will allways have to match DATABASE_ENCODING, so it's a much better idea to directly tie them together.

11/28/05 16:18:40 changed by me@julik.nl

Point taken, I agree. How many encodings should be included in the dicts per driver if we decide to handle it that way? IANA list is just insanely long, especially including all the possible aliases.

11/28/05 16:31:41 changed by Maniac <Maniac@SoftwareManiacs.Org>

Hugo, this is exactly what this patch is intended to do (as far as I understand). If you set client encoding for database this is the encoding which database expects on client regardless of its internal encoding. If you have legacy database in, say, windows-1251 (cyrillic) and want to use your Django app modern proper way with utf-8 then you're in trouble since noone tells your database to convert windows-1251 to utf-8 upon SELECT.

The separate setting is required because you can't use DEFAULT_CHARSET directly for SET ENCODING in database because databases do the lame thing with using some non standard names for encodings. So DATABASE_ENCODING is merely intended as a 'translation' of DEFAULT_CHARSET for your database driver. Doing it automatically would require a huge mapping table in a form

  (('utf-8','unicode'),('windows-1251','cp1251') ... )

for each backend as Julik pointed. I, for example, wouldn't start implementing a new backend if I knew it would require to match some several dozens strings manually. On the contrary each Django user would wrk with maybe one or two legacy encodings.

Julik, may be document it in the comment to the setting since it's not entirely obvious.

11/30/05 10:01:43 changed by me@julik.nl

Ok, we can do it the other way. Let's have a dict of, say, 6 most common encodings per driver (unicode, JIS, cp-1251 and maybe some more). If the user is using some DEFAULT_CHARSET which is not covered by the dict and does NOT specify DATABASE_ENCODING an exception is raised along the lines of "You are using some charset but you also need to tell your database to support it, and Django doesn't know how to do it for you". Otherwise, if the DEFAULT_CHARSET is mapped by a dict then we just use it straight away without requiring any bothering with DATABASE_ENCODING.

Will that be reasonable?

11/30/05 17:52:25 changed by hugo

sounds reasonable to me - in most cases users won't need to change that setting, as they will just stick with one of the standard encodings. And for unusual situations you can specify what it should do.

11/30/05 18:46:47 changed by me@julik.nl

Ok, I'm cooking the new patch. I think DATABASE_ENCODING be better renamed DATABASE_CLIENT_CHARSET (if we already have default charset) and still kept in the settings.py but left empty.

11/30/05 20:02:11 changed by me@julik.nl

  • attachment database_client_charset.diff added.

Automagic added

11/30/05 20:05:24 changed by me@julik.nl

  • cc set to me@julik.nl.
  • keywords changed from encodings database to encodings database i18n.

Ok, this is the patch. I constrained the dicts to the absolute minimum. There is also a unicode flag that you can set in the driver as well as a charset flag, I don't really know how to deal with these properly at the moment - if Hugo can lend a hand would be great. But it worksforme now.

I also renamed the variable and added bailing if this can't be set up (because it's really wrong when it's not being taken care of explicitly).

11/30/05 22:11:43 changed by me@julik.nl

Hm, wait a second. Let's see. Is it DEFAULT_ENCODING as the sys.defaultencoding or the DEFAULT_CHARSET defined in Django?

11/30/05 22:14:56 changed by me@julik.nl

  • attachment corrected.diff added.

Why I never get it right the first time I wonder

04/04/06 08:46:36 changed by anonymous

  • cc changed from me%40julik.nl to me@julik.nl.
  • keywords changed from encodings%2Bdatabase%2Bi18n to encodings database i18n.
  • summary changed from %5Bpatch%5D+Allow+for+database+client+encoding+configuration+from+project+settings to [patch] Allow for database client encoding configuration from project settings.

05/24/06 09:24:01 changed by anonymous

  • owner changed from anonymous to adrian.
  • component changed from Translations to Database wrapper.

repair spam damage at least a bit

05/26/06 15:34:31 changed by Greg

  • severity changed from minor to major.
  • component changed from Database wrapper to Documentation.
  • version changed from SVN to new-admin.
  • milestone changed from Version 0.91 to Version 1.0.
  • owner changed from adrian to jacob.
  • type set to defect.

05/27/06 14:49:57 changed by candisil

  • type deleted.

05/27/06 21:11:17 changed by Old Lady

  • type set to Submit changes.

05/28/06 12:57:23 changed by Philip

  • component changed from Documentation to Contrib apps.
  • priority changed from low to lowest.
  • version deleted.
  • milestone changed from Version 1.0 to Version 0.93.
  • owner changed from jacob to adrian.
  • type changed from Submit changes to enhancement.

05/28/06 14:21:17 changed by Soma Pill

  • version set to Contrib apps.

05/28/06 14:59:05 changed by valbienn

  • type changed from enhancement to defect.

05/29/06 14:20:42 changed by Philip

  • severity changed from major to blocker.
  • component changed from Contrib apps to Cache system.
  • priority changed from lowest to high.
  • milestone changed from Version 0.93 to Version 1.0.
  • owner changed from adrian to jacob.
  • type changed from defect to enhancement.

05/29/06 15:31:01 changed by Mike

  • priority changed from high to low.
  • version changed from Contrib apps to new-admin.
  • type changed from enhancement to defect.
  • severity changed from blocker to major.
  • milestone changed from Version 1.0 to Version 0.92.

Hello. nice site!

05/30/06 22:57:42 changed by winstrol

  • type deleted.

05/31/06 06:34:35 changed by Vicodin

  • type set to Submit changes.

05/31/06 14:08:32 changed by Licex

  • type changed from Submit changes to defect.

05/31/06 14:21:06 changed by Licex

  • type deleted.

05/31/06 14:27:34 changed by Licex

  • type set to defect.

06/03/06 09:27:23 changed by allegra 180mg

  • cc changed from me@julik.nl to empty.com.
  • keywords changed from None to empty.com.
  • summary changed from [patch] Allow for database client encoding configuration from project settings to empty.com.

06/03/06 17:21:42 changed by adrian

  • severity changed from major to normal.
  • cc deleted.
  • component changed from Cache system to Database wrapper.
  • summary changed from empty.com to [patch] Allow for database client encoding configuration from project settings.
  • priority changed from low to normal.
  • owner changed from jacob to adrian.
  • version changed from new-admin to SVN.
  • milestone deleted.
  • keywords deleted.
  • type changed from defect to enhancement.

06/05/06 16:19:20 changed by

  • cc set to empty.com.
  • keywords set to empty.com.
  • summary changed from [patch] Allow for database client encoding configuration from project settings to empty.com.

06/05/06 16:27:39 changed by adrian

  • cc deleted.
  • keywords deleted.
  • summary changed from empty.com to [patch] Allow for database client encoding configuration from project settings.

06/05/06 22:11:54 changed by Levitra Cheap

  • milestone set to normal.

06/06/06 23:19:02 changed by vanity

  • type deleted.

06/07/06 22:07:58 changed by Cupcakes Kid

  • type set to defect.

06/11/06 02:33:23 changed by Adipx

  • type deleted.

06/12/06 07:19:43 changed by zetacap

  • type set to defect.

06/12/06 14:26:40 changed by tranny movies

  • type deleted.

06/13/06 21:29:33 changed by offshore online gambling

  • type set to defect.

06/16/06 12:08:22 changed by Hyper Hhyroid

  • type deleted.

06/16/06 17:05:33 changed by Maxoderm

  • type set to defect.

06/19/06 04:44:17 changed by Low Thyroid

  • type deleted.

06/29/06 08:12:12 changed by martin

  • type set to defect.

07/27/06 17:25:44 changed by jacob

  • status changed from new to closed.
  • resolution set to wontfix.

This is a tricky one.

Whatever we choose is going to change when we deal with the whole sticky unicodification aspect... so for now, I'm going to punt and mark this WONTFIX with the understanding that we'll fix it in a comprehensive manner in the future. For the record, the answer now is to just make sure your database's default encoding *does* match, which hopefully isn't too much of a nasty hack.

08/01/06 06:16:12 changed by roullette

  • type deleted.

08/01/06 16:35:26 changed by hoodia gordonii

  • type set to defect.

08/01/06 23:29:34 changed by lavalife

  • type deleted.

08/02/06 14:47:53 changed by Tramadol

  • type set to Submit changes.

08/02/06 15:43:43 changed by auto refiance loans

  • type deleted.

08/20/06 13:32:44 changed by anonymous

  • type set to defect.

08/22/06 11:38:05 changed by Ivan Sagalaev <Maniac@SoftwareManiacs.Org>

  • cc set to Maniac@SoftwareManiacs.Org.

Jacon, this ticket was mentioned in a thread on django-developers but I suppose those messages are easy to miss so I repost the relevant bits here.

The ongoing unicodification will actually require support for legacy encoded DBs and this ticket looks like a best place to do this. So may be instead of wontfixing it it's better to rework patches for the new vision (which means just dropping encoding into DEFAULT_CHARSET leaving data from DB in unicode)?

08/22/06 12:27:40 changed by Ivan Sagalaev <Maniac@SoftwareManiacs.Org>

Oh... Jacob, sorry for misspelling your name :-(

12/15/06 16:18:07 changed by adrian

  • status changed from closed to reopened.
  • resolution deleted.

Reopening. #1528, #2514, #1987, #2810 and #2896 were duplicates.

12/15/06 16:19:56 changed by adrian

See also the UnicodeInDjango wiki page.

12/15/06 16:33:47 changed by adrian

#3115 is related for PostgreSQL.

01/17/07 23:28:44 changed by SmileyChris

  • needs_better_patch set to 1.
  • stage changed from Unreviewed to Accepted.

01/26/07 04:21:22 changed by Michael Radziej <mir@noris.de>

  • stage changed from Accepted to Design decision needed.

We have a bit of chaos here ... Tickets #3370, #1356 and probably #952 all are about this problem, all are accepted, and #3370 and #1356 have very similar patches. I ask everybody to continue discussion in django-developers ("unicode issues in multiple tickets"), and I ask the authors of these three tickets to work together to find out how to proceed.

As long as it's not clear which path to take, I mark all bugs as "design decision needed." (I assume that the other reviews were not aware of the competing tickets.)

http://groups.google.com/group/django-developers/browse_thread/thread/4b71be8257d42faf

01/30/07 16:02:04 changed by anonymous

#2584 marked as duplicate.

01/30/07 16:02:43 changed by mir@noris.de

sorry, it was me.

03/03/07 16:53:23 changed by anonymous

Please ACCEPT this commit, it would help. I had also this error ; I modified structure of the tables and set UTF8 like suggested there, and I don't have it anymore. I am using MySQL 5.0.24. Maybe this bug is not your fault, but it wouldn't cost anything to add the encoding declaration in the source... and that would solve some problems. But not all that exist with accents...

(about ticket #2896)

04/22/07 08:08:13 changed by anonymous

  • cc changed from Maniac@SoftwareManiacs.Org to Maniac@SoftwareManiacs.Org., jm.bugtracking@gmail.com.

05/14/07 07:58:45 changed by mtredinnick

  • status changed from reopened to closed.
  • resolution set to wontfix.

In light of the changes in the unicode branch, this setting is no longer needed. All our supported database backends that can handle variable server encodings take care of that information transparently. We just hand them unicode strings or UTF-8 bytestrings and it should work transparently.

So this is wontfix and the root problem will go away once unicode branch merges into trunk.


Add/Change #952 ([patch] Allow for database client encoding configuration from project settings)




Change Properties
Action