Opened 11 years ago

Closed 10 years ago

#952 closed defect (wontfix)

[patch] Allow for database client encoding configuration from project settings

Reported by: me@… Owned by: Adrian Holovaty
Component: Database layer (models, ORM) Version: master
Severity: normal Keywords:
Cc: Maniac@…, jm.bugtracking@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

This allows the user to define the client encoding of his database and get it automatically used throughout Django.
Includes Postgres and MySQL implementations.

Attachments (4)

encoding.diff (2.6 KB) - added by me@… 11 years ago.
Necessary changes
encoding.2.diff (3.1 KB) - added by me@… 11 years ago.
necessary changes
database_client_charset.diff (5.8 KB) - added by me@… 11 years ago.
Automagic added
corrected.diff (5.8 KB) - added by me@… 11 years ago.
Why I never get it right the first time I wonder

Download all attachments as: .zip

Change History (65)

Changed 11 years ago by me@…

Attachment: encoding.diff added

Necessary changes

Changed 11 years ago by me@…

Attachment: encoding.2.diff added

necessary changes

comment:1 Changed 11 years ago by me@…

The second diff fixes a typo in postgres driver. This supercedes and closes #814

comment:2 Changed 11 years ago by hugo

please keep in mind that the DEFAULT_CHARSET can be changed, so it would be better to not introduce a new setting DATABASE_ENCODING, but just to use the DEFAULT_ENCODING and put some hook into the backends that will do database specific deeds with that (the postgresql backend can then change 'utf-8' to 'unicode').

comment:3 Changed 11 years ago by me@…

This would require storing a table of encoding names for every driver. Besides this configuration setting is useful. For example - you might be using charset A for your database - maybe for legacy reasons - and charset B for the middleware and the frontend.

I thought that it would be both simpler and more flexible to allow the developer to set up his database settings separately (most ASCII-speaking developers won't need it anyway). But reusing DEFAULT_CHARSET is neat.

comment:4 Changed 11 years ago by me@…

Besides creating the translation tables per driver would require parsing http://www.iana.org/assignments/character-sets for every possible character set name and alias, finding out all of the mappings for these names within PostgreSQL and MySQL and storing these tables in Django sources. These tables would probably exceed the line count of the drivers themselves. Is this really necessary?

comment:5 Changed 11 years ago by hugo

you can't use different encodings for Django and your database backend, because Django never sends Unicode strings but allways sends bytestrings encoded in DEFAULT_ENCODING and so the database driver can't know what to do with it besides passing it along. So the DEFAULT_ENCODING will allways have to match DATABASE_ENCODING, so it's a much better idea to directly tie them together.

comment:6 Changed 11 years ago by me@…

Point taken, I agree. How many encodings should be included in the dicts per driver if we decide to handle it that way? IANA list is just insanely long, especially including all the possible aliases.

comment:7 Changed 11 years ago by Maniac <Maniac@…>

Hugo, this is exactly what this patch is intended to do (as far as I understand). If you set client encoding for database this is the encoding which database expects on client regardless of its internal encoding. If you have legacy database in, say, windows-1251 (cyrillic) and want to use your Django app modern proper way with utf-8 then you're in trouble since noone tells your database to convert windows-1251 to utf-8 upon SELECT.

The separate setting is required because you can't use DEFAULT_CHARSET directly for SET ENCODING in database because databases do the lame thing with using some non standard names for encodings. So DATABASE_ENCODING is merely intended as a 'translation' of DEFAULT_CHARSET for your database driver. Doing it automatically would require a huge mapping table in a form

  (('utf-8','unicode'),('windows-1251','cp1251') ... )

for each backend as Julik pointed. I, for example, wouldn't start implementing a new backend if I knew it would require to match some several dozens strings manually. On the contrary each Django user would wrk with maybe one or two legacy encodings.

Julik, may be document it in the comment to the setting since it's not entirely obvious.

comment:8 Changed 11 years ago by me@…

Ok, we can do it the other way.
Let's have a dict of, say, 6 most common encodings per driver (unicode, JIS, cp-1251 and maybe some more). If the user is using some DEFAULT_CHARSET which is not covered by the dict and does NOT specify DATABASE_ENCODING an exception is raised along the lines of "You are using some charset but you also need to tell your database to support it, and Django doesn't know how to do it for you". Otherwise, if the DEFAULT_CHARSET is mapped by a dict then we just use it straight away without requiring any bothering with DATABASE_ENCODING.

Will that be reasonable?

comment:9 Changed 11 years ago by hugo

sounds reasonable to me - in most cases users won't need to change that setting, as they will just stick with one of the standard encodings. And for unusual situations you can specify what it should do.

comment:10 Changed 11 years ago by me@…

Ok, I'm cooking the new patch. I think DATABASE_ENCODING be better renamed DATABASE_CLIENT_CHARSET (if we already have default charset) and still kept in the settings.py but left empty.

Changed 11 years ago by me@…

Automagic added

comment:11 Changed 11 years ago by me@…

Cc: me@… added
Keywords: i18n added

Ok, this is the patch. I constrained the dicts to the absolute minimum. There is also a unicode flag that you can set in the driver as well as a charset flag, I don't really know how to deal with these properly at the moment - if Hugo can lend a hand would be great. But it worksforme now.

I also renamed the variable and added bailing if this can't be set up (because it's really wrong when it's not being taken care of explicitly).

comment:12 Changed 11 years ago by me@…

Hm, wait a second. Let's see. Is it DEFAULT_ENCODING as the sys.defaultencoding or the DEFAULT_CHARSET defined in Django?

Changed 11 years ago by me@…

Attachment: corrected.diff added

Why I never get it right the first time I wonder

comment:13 Changed 11 years ago by anonymous

Cc: me@… added; me%40julik.nl removed
Keywords: encodings database i18n added; encodings%2Bdatabase%2Bi18n removed
Summary: %5Bpatch%5D+Allow+for+database+client+encoding+configuration+from+project+settings[patch] Allow for database client encoding configuration from project settings

comment:14 Changed 11 years ago by anonymous

Component: TranslationsDatabase wrapper
Owner: changed from anonymous to Adrian Holovaty

repair spam damage at least a bit

comment:15 Changed 11 years ago by Greg

Component: Database wrapperDocumentation
milestone: Version 0.91Version 1.0
Owner: changed from Adrian Holovaty to Jacob
Severity: minormajor
Type: defect
Version: SVNnew-admin

comment:16 Changed 11 years ago by candisil

Type: defect

comment:17 Changed 11 years ago by Old Lady

Type: Submit changes

comment:18 Changed 11 years ago by Philip

Component: DocumentationContrib apps
milestone: Version 1.0Version 0.93
Owner: changed from Jacob to Adrian Holovaty
priority: lowlowest
Type: Submit changesenhancement
Version: new-admin

comment:19 Changed 11 years ago by Soma Pill

Version: Contrib apps

comment:20 Changed 11 years ago by valbienn

Type: enhancementdefect

comment:21 Changed 11 years ago by Philip

Component: Contrib appsCache system
milestone: Version 0.93Version 1.0
Owner: changed from Adrian Holovaty to Jacob
priority: lowesthigh
Severity: majorblocker
Type: defectenhancement

comment:22 Changed 11 years ago by Mike

milestone: Version 1.0Version 0.92
priority: highlow
Severity: blockermajor
Type: enhancementdefect
Version: Contrib appsnew-admin

Hello. nice site!

comment:23 Changed 11 years ago by winstrol

Type: defect

comment:24 Changed 11 years ago by Vicodin

Type: Submit changes

comment:25 Changed 11 years ago by Licex

Type: Submit changesdefect

comment:26 Changed 11 years ago by Licex

Type: defect

comment:27 Changed 11 years ago by Licex

Type: defect

comment:28 Changed 11 years ago by allegra 180mg

Cc: empty.com added; me@… removed
Keywords: empty.com added; None removed
Summary: [patch] Allow for database client encoding configuration from project settingsempty.com

comment:29 Changed 11 years ago by Adrian Holovaty

Cc: empty.com removed
Component: Cache systemDatabase wrapper
Keywords: empty.com removed
milestone: Version 0.92
Owner: changed from Jacob to Adrian Holovaty
priority: lownormal
Severity: majornormal
Summary: empty.com[patch] Allow for database client encoding configuration from project settings
Type: defectenhancement
Version: new-adminSVN

comment:30 Changed 11 years ago by (none)

Cc: empty.com added
Keywords: empty.com added
Summary: [patch] Allow for database client encoding configuration from project settingsempty.com

comment:31 Changed 11 years ago by Adrian Holovaty

Cc: empty.com removed
Keywords: empty.com removed
Summary: empty.com[patch] Allow for database client encoding configuration from project settings

comment:32 Changed 11 years ago by Levitra Cheap

milestone: normal

comment:33 Changed 11 years ago by vanity

Type: enhancement

comment:34 Changed 11 years ago by Cupcakes Kid

Type: defect

comment:35 Changed 10 years ago by Adipx

Type: defect

comment:36 Changed 10 years ago by zetacap

Type: defect

comment:37 Changed 10 years ago by tranny movies

Type: defect

comment:38 Changed 10 years ago by offshore online gambling

Type: defect

comment:39 Changed 10 years ago by Hyper Hhyroid

Type: defect

comment:40 Changed 10 years ago by Maxoderm

Type: defect

comment:41 Changed 10 years ago by Low Thyroid

Type: defect

comment:42 Changed 10 years ago by martin

Type: defect

comment:43 Changed 10 years ago by Jacob

Resolution: wontfix
Status: newclosed

This is a tricky one.

Whatever we choose is going to change when we deal with the whole sticky unicodification aspect... so for now, I'm going to punt and mark this WONTFIX with the understanding that we'll fix it in a comprehensive manner in the future. For the record, the answer now is to just make sure your database's default encoding *does* match, which hopefully isn't too much of a nasty hack.

comment:44 Changed 10 years ago by roullette

Type: defect

comment:45 Changed 10 years ago by hoodia gordonii

Type: defect

comment:46 Changed 10 years ago by lavalife

Type: defect

comment:47 Changed 10 years ago by Tramadol

Type: Submit changes

comment:48 Changed 10 years ago by auto refiance loans

Type: Submit changes

comment:49 Changed 10 years ago by anonymous

Type: defect

comment:50 Changed 10 years ago by Ivan Sagalaev <Maniac@…>

Cc: Maniac@… added

Jacon, this ticket was mentioned in a thread on django-developers but I suppose those messages are easy to miss so I repost the relevant bits here.

The ongoing unicodification will actually require support for legacy encoded DBs and this ticket looks like a best place to do this. So may be instead of wontfixing it it's better to rework patches for the new vision (which means just dropping encoding into DEFAULT_CHARSET leaving data from DB in unicode)?

comment:51 Changed 10 years ago by Ivan Sagalaev <Maniac@…>

Oh... Jacob, sorry for misspelling your name :-(

comment:52 Changed 10 years ago by Adrian Holovaty

Resolution: wontfix
Status: closedreopened

Reopening. #1528, #2514, #1987, #2810 and #2896 were duplicates.

comment:53 Changed 10 years ago by Adrian Holovaty

See also the UnicodeInDjango wiki page.

comment:54 Changed 10 years ago by Adrian Holovaty

#3115 is related for PostgreSQL.

comment:55 Changed 10 years ago by Chris Beaven

Patch needs improvement: set
Triage Stage: UnreviewedAccepted

comment:56 Changed 10 years ago by Michael Radziej <mir@…>

Triage Stage: AcceptedDesign decision needed

We have a bit of chaos here ... Tickets #3370, #1356 and probably #952 all are about this problem, all are accepted, and #3370 and #1356 have very similar patches. I ask everybody to continue discussion in django-developers ("unicode issues in multiple tickets"), and I ask the authors of these three tickets to work together to find out how to proceed.

As long as it's not clear which path to take, I mark all bugs as "design decision needed." (I assume that the other reviews were not aware of the competing tickets.)

http://groups.google.com/group/django-developers/browse_thread/thread/4b71be8257d42faf

comment:57 Changed 10 years ago by anonymous

#2584 marked as duplicate.

comment:58 Changed 10 years ago by mir@…

sorry, it was me.

comment:59 Changed 10 years ago by anonymous

Please ACCEPT this commit, it would help. I had also this error ; I modified structure of the tables and set UTF8 like suggested there, and I don't have it anymore.
I am using MySQL 5.0.24. Maybe this bug is not your fault, but it wouldn't cost anything to add the encoding declaration in the source... and that would solve some problems. But not all that exist with accents...

(about ticket #2896)

comment:60 Changed 10 years ago by anonymous

Cc: Maniac@… jm.bugtracking@… added; Maniac@… removed

comment:61 Changed 10 years ago by Malcolm Tredinnick

Resolution: wontfix
Status: reopenedclosed

In light of the changes in the unicode branch, this setting is no longer needed. All our supported database backends that can handle variable server encodings take care of that information transparently. We just hand them unicode strings or UTF-8 bytestrings and it should work transparently.

So this is wontfix and the root problem will go away once unicode branch merges into trunk.

Note: See TracTickets for help on using tickets.
Back to Top