Opened 9 years ago

Closed 8 years ago

#952 closed defect (wontfix)

[patch] Allow for database client encoding configuration from project settings

Reported by: me@… Owned by: adrian
Component: Database layer (models, ORM) Version: master
Severity: normal Keywords:
Cc: Maniac@…, jm.bugtracking@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

This allows the user to define the client encoding of his database and get it automatically used throughout Django.
Includes Postgres and MySQL implementations.

Attachments (4)

encoding.diff (2.6 KB) - added by me@… 9 years ago.
Necessary changes
encoding.2.diff (3.1 KB) - added by me@… 9 years ago.
necessary changes
database_client_charset.diff (5.8 KB) - added by me@… 9 years ago.
Automagic added
corrected.diff (5.8 KB) - added by me@… 9 years ago.
Why I never get it right the first time I wonder

Download all attachments as: .zip

Change History (65)

Changed 9 years ago by me@…

Necessary changes

Changed 9 years ago by me@…

necessary changes

comment:1 Changed 9 years ago by me@…

The second diff fixes a typo in postgres driver. This supercedes and closes #814

comment:2 Changed 9 years ago by hugo

please keep in mind that the DEFAULT_CHARSET can be changed, so it would be better to not introduce a new setting DATABASE_ENCODING, but just to use the DEFAULT_ENCODING and put some hook into the backends that will do database specific deeds with that (the postgresql backend can then change 'utf-8' to 'unicode').

comment:3 Changed 9 years ago by me@…

This would require storing a table of encoding names for every driver. Besides this configuration setting is useful. For example - you might be using charset A for your database - maybe for legacy reasons - and charset B for the middleware and the frontend.

I thought that it would be both simpler and more flexible to allow the developer to set up his database settings separately (most ASCII-speaking developers won't need it anyway). But reusing DEFAULT_CHARSET is neat.

comment:4 Changed 9 years ago by me@…

Besides creating the translation tables per driver would require parsing http://www.iana.org/assignments/character-sets for every possible character set name and alias, finding out all of the mappings for these names within PostgreSQL and MySQL and storing these tables in Django sources. These tables would probably exceed the line count of the drivers themselves. Is this really necessary?

comment:5 Changed 9 years ago by hugo

you can't use different encodings for Django and your database backend, because Django never sends Unicode strings but allways sends bytestrings encoded in DEFAULT_ENCODING and so the database driver can't know what to do with it besides passing it along. So the DEFAULT_ENCODING will allways have to match DATABASE_ENCODING, so it's a much better idea to directly tie them together.

comment:6 Changed 9 years ago by me@…

Point taken, I agree. How many encodings should be included in the dicts per driver if we decide to handle it that way? IANA list is just insanely long, especially including all the possible aliases.

comment:7 Changed 9 years ago by Maniac <Maniac@…>

Hugo, this is exactly what this patch is intended to do (as far as I understand). If you set client encoding for database this is the encoding which database expects on client regardless of its internal encoding. If you have legacy database in, say, windows-1251 (cyrillic) and want to use your Django app modern proper way with utf-8 then you're in trouble since noone tells your database to convert windows-1251 to utf-8 upon SELECT.

The separate setting is required because you can't use DEFAULT_CHARSET directly for SET ENCODING in database because databases do the lame thing with using some non standard names for encodings. So DATABASE_ENCODING is merely intended as a 'translation' of DEFAULT_CHARSET for your database driver. Doing it automatically would require a huge mapping table in a form

  (('utf-8','unicode'),('windows-1251','cp1251') ... )

for each backend as Julik pointed. I, for example, wouldn't start implementing a new backend if I knew it would require to match some several dozens strings manually. On the contrary each Django user would wrk with maybe one or two legacy encodings.

Julik, may be document it in the comment to the setting since it's not entirely obvious.

comment:8 Changed 9 years ago by me@…

Ok, we can do it the other way.
Let's have a dict of, say, 6 most common encodings per driver (unicode, JIS, cp-1251 and maybe some more). If the user is using some DEFAULT_CHARSET which is not covered by the dict and does NOT specify DATABASE_ENCODING an exception is raised along the lines of "You are using some charset but you also need to tell your database to support it, and Django doesn't know how to do it for you". Otherwise, if the DEFAULT_CHARSET is mapped by a dict then we just use it straight away without requiring any bothering with DATABASE_ENCODING.

Will that be reasonable?

comment:9 Changed 9 years ago by hugo

sounds reasonable to me - in most cases users won't need to change that setting, as they will just stick with one of the standard encodings. And for unusual situations you can specify what it should do.

comment:10 Changed 9 years ago by me@…

Ok, I'm cooking the new patch. I think DATABASE_ENCODING be better renamed DATABASE_CLIENT_CHARSET (if we already have default charset) and still kept in the settings.py but left empty.

Changed 9 years ago by me@…

Automagic added

comment:11 Changed 9 years ago by me@…

  • Cc me@… added
  • Keywords i18n added

Ok, this is the patch. I constrained the dicts to the absolute minimum. There is also a unicode flag that you can set in the driver as well as a charset flag, I don't really know how to deal with these properly at the moment - if Hugo can lend a hand would be great. But it worksforme now.

I also renamed the variable and added bailing if this can't be set up (because it's really wrong when it's not being taken care of explicitly).

comment:12 Changed 9 years ago by me@…

Hm, wait a second. Let's see. Is it DEFAULT_ENCODING as the sys.defaultencoding or the DEFAULT_CHARSET defined in Django?

Changed 9 years ago by me@…

Why I never get it right the first time I wonder

comment:13 Changed 9 years ago by anonymous

  • Cc me@… added; me%40julik.nl removed
  • Keywords encodings database i18n added; encodings%2Bdatabase%2Bi18n removed
  • Summary changed from %5Bpatch%5D+Allow+for+database+client+encoding+configuration+from+project+settings to [patch] Allow for database client encoding configuration from project settings

comment:14 Changed 9 years ago by anonymous

  • Component changed from Translations to Database wrapper
  • Owner changed from anonymous to adrian

repair spam damage at least a bit

comment:15 Changed 9 years ago by Greg

  • Component changed from Database wrapper to Documentation
  • milestone changed from Version 0.91 to Version 1.0
  • Owner changed from adrian to jacob
  • Severity changed from minor to major
  • Type set to defect
  • Version changed from SVN to new-admin

comment:16 Changed 9 years ago by candisil

  • Type defect deleted

comment:17 Changed 9 years ago by Old Lady

  • Type set to Submit changes

comment:18 Changed 9 years ago by Philip

  • Component changed from Documentation to Contrib apps
  • milestone changed from Version 1.0 to Version 0.93
  • Owner changed from jacob to adrian
  • priority changed from low to lowest
  • Type changed from Submit changes to enhancement
  • Version new-admin deleted

comment:19 Changed 9 years ago by Soma Pill

  • Version set to Contrib apps

comment:20 Changed 9 years ago by valbienn

  • Type changed from enhancement to defect

comment:21 Changed 9 years ago by Philip

  • Component changed from Contrib apps to Cache system
  • milestone changed from Version 0.93 to Version 1.0
  • Owner changed from adrian to jacob
  • priority changed from lowest to high
  • Severity changed from major to blocker
  • Type changed from defect to enhancement

comment:22 Changed 9 years ago by Mike

  • milestone changed from Version 1.0 to Version 0.92
  • priority changed from high to low
  • Severity changed from blocker to major
  • Type changed from enhancement to defect
  • Version changed from Contrib apps to new-admin

Hello. nice site!

comment:23 Changed 9 years ago by winstrol

  • Type defect deleted

comment:24 Changed 9 years ago by Vicodin

  • Type set to Submit changes

comment:25 Changed 9 years ago by Licex

  • Type changed from Submit changes to defect

comment:26 Changed 9 years ago by Licex

  • Type defect deleted

comment:27 Changed 9 years ago by Licex

  • Type set to defect

comment:28 Changed 9 years ago by allegra 180mg

  • Cc empty.com added; me@… removed
  • Keywords empty.com added; None removed
  • Summary changed from [patch] Allow for database client encoding configuration from project settings to empty.com

comment:29 Changed 9 years ago by adrian

  • Cc empty.com removed
  • Component changed from Cache system to Database wrapper
  • Keywords empty.com removed
  • milestone Version 0.92 deleted
  • Owner changed from jacob to adrian
  • priority changed from low to normal
  • Severity changed from major to normal
  • Summary changed from empty.com to [patch] Allow for database client encoding configuration from project settings
  • Type changed from defect to enhancement
  • Version changed from new-admin to SVN

comment:30 Changed 9 years ago by anonymous

  • Cc empty.com added
  • Keywords empty.com added
  • Summary changed from [patch] Allow for database client encoding configuration from project settings to empty.com

comment:31 Changed 9 years ago by adrian

  • Cc empty.com removed
  • Keywords empty.com removed
  • Summary changed from empty.com to [patch] Allow for database client encoding configuration from project settings

comment:32 Changed 9 years ago by Levitra Cheap

  • milestone set to normal

comment:33 Changed 9 years ago by vanity

  • Type enhancement deleted

comment:34 Changed 9 years ago by Cupcakes Kid

  • Type set to defect

comment:35 Changed 9 years ago by Adipx

  • Type defect deleted

comment:36 Changed 9 years ago by zetacap

  • Type set to defect

comment:37 Changed 9 years ago by tranny movies

  • Type defect deleted

comment:38 Changed 9 years ago by offshore online gambling

  • Type set to defect

comment:39 Changed 9 years ago by Hyper Hhyroid

  • Type defect deleted

comment:40 Changed 9 years ago by Maxoderm

  • Type set to defect

comment:41 Changed 9 years ago by Low Thyroid

  • Type defect deleted

comment:42 Changed 9 years ago by martin

  • Type set to defect

comment:43 Changed 9 years ago by jacob

  • Resolution set to wontfix
  • Status changed from new to closed

This is a tricky one.

Whatever we choose is going to change when we deal with the whole sticky unicodification aspect... so for now, I'm going to punt and mark this WONTFIX with the understanding that we'll fix it in a comprehensive manner in the future. For the record, the answer now is to just make sure your database's default encoding *does* match, which hopefully isn't too much of a nasty hack.

comment:44 Changed 9 years ago by roullette

  • Type defect deleted

comment:45 Changed 9 years ago by hoodia gordonii

  • Type set to defect

comment:46 Changed 9 years ago by lavalife

  • Type defect deleted

comment:47 Changed 9 years ago by Tramadol

  • Type set to Submit changes

comment:48 Changed 9 years ago by auto refiance loans

  • Type Submit changes deleted

comment:49 Changed 9 years ago by anonymous

  • Type set to defect

comment:50 Changed 9 years ago by Ivan Sagalaev <Maniac@…>

  • Cc Maniac@… added

Jacon, this ticket was mentioned in a thread on django-developers but I suppose those messages are easy to miss so I repost the relevant bits here.

The ongoing unicodification will actually require support for legacy encoded DBs and this ticket looks like a best place to do this. So may be instead of wontfixing it it's better to rework patches for the new vision (which means just dropping encoding into DEFAULT_CHARSET leaving data from DB in unicode)?

comment:51 Changed 9 years ago by Ivan Sagalaev <Maniac@…>

Oh... Jacob, sorry for misspelling your name :-(

comment:52 Changed 8 years ago by adrian

  • Resolution wontfix deleted
  • Status changed from closed to reopened

Reopening. #1528, #2514, #1987, #2810 and #2896 were duplicates.

comment:53 Changed 8 years ago by adrian

See also the UnicodeInDjango wiki page.

comment:54 Changed 8 years ago by adrian

#3115 is related for PostgreSQL.

comment:55 Changed 8 years ago by SmileyChris

  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted

comment:56 Changed 8 years ago by Michael Radziej <mir@…>

  • Triage Stage changed from Accepted to Design decision needed

We have a bit of chaos here ... Tickets #3370, #1356 and probably #952 all are about this problem, all are accepted, and #3370 and #1356 have very similar patches. I ask everybody to continue discussion in django-developers ("unicode issues in multiple tickets"), and I ask the authors of these three tickets to work together to find out how to proceed.

As long as it's not clear which path to take, I mark all bugs as "design decision needed." (I assume that the other reviews were not aware of the competing tickets.)

http://groups.google.com/group/django-developers/browse_thread/thread/4b71be8257d42faf

comment:57 Changed 8 years ago by anonymous

#2584 marked as duplicate.

comment:58 Changed 8 years ago by mir@…

sorry, it was me.

comment:59 Changed 8 years ago by anonymous

Please ACCEPT this commit, it would help. I had also this error ; I modified structure of the tables and set UTF8 like suggested there, and I don't have it anymore.
I am using MySQL 5.0.24. Maybe this bug is not your fault, but it wouldn't cost anything to add the encoding declaration in the source... and that would solve some problems. But not all that exist with accents...

(about ticket #2896)

comment:60 Changed 8 years ago by anonymous

  • Cc Maniac@… jm.bugtracking@… added; Maniac@… removed

comment:61 Changed 8 years ago by mtredinnick

  • Resolution set to wontfix
  • Status changed from reopened to closed

In light of the changes in the unicode branch, this setting is no longer needed. All our supported database backends that can handle variable server encodings take care of that information transparently. We just hand them unicode strings or UTF-8 bytestrings and it should work transparently.

So this is wontfix and the root problem will go away once unicode branch merges into trunk.

Note: See TracTickets for help on using tickets.
Back to Top