|Version 5 (modified by verbosus, 9 years ago) (diff)|
Unicode and Django
This page is to make an impact analysis on the Django source to see what parts of it need what changes if we want to switch Django from using utf-8 bytestrings internally to fully use unicode strings internally.
Just a pin-down of things that spring to mind, all of them need more complete checking:
- database backends need to handle unicode vs. DATABASE_ENCODING translations
- special casing: the psycopg backend will need type handlers for string types (just as it already has type handlers for date/time types)
- the HTTPResponse sending machinery needs to do the unicode to DEFAULT_ENCODING translation
- the HTTPRequest creation process needs to turn outside strings into unicode strings, using the provided charset (if given) or defaulting to DEFAULT_ENCODING (as that is what was sent to the browser when the form was transmitted)
- special casing: what happens with GET parameters? those don't provide charsets, what should we do if DEFAULT_ENCODING is utf-8, but the GET parameters aren't valid utf-8? The clean way would be to throw an exception (like with all other places, too)
- internal usage of str() needs to be checked and supposedly changed over to unicode() usage
- debugging stuff needs to use repr() on strings, not str() (or use unicode() and let the HTTP response handling stuff handle the conversion - most debugging stuff is working with the response machinery anyway)
- mail sending functions need to do the right thing with the MIME type
- we should decide wether to normalize the input unicode data so that at the database or application level we can match strings regardless of their decomposition (see the standard lib’s unicodedata module with its normalize() function). I would go for NFC, if there’s consensus around normalizing.
Please either complete the above list or add headlines with more detailed discussions of the points above. Please only post results here, discussion should take place on the django-developer list.