[[TOC]] = The unicode branch = This branch aims to make Django's internals fully Unicode-aware. == Status == '''This branch is now closed.''' The branch was created on April 7, 2007. The branch was merged into trunk on July 4, 2007 in [5609]. == How to get the branch == {{{ svn co http://code.djangoproject.com/svn/django/branches/unicode/ }}} See our [http://www.djangoproject.com/documentation/contributing/#branch-policy branch policy] for full information on how to use a branch. == Goals == The main goals of this branch are: * Make it easier for developers to work with non-ASCII character data when working with Django. * Be more consistent in our string handling behaviour inside Django (see StringEncoding for details on this). Upon completion, you will be able to pass around unicode strings anywhere inside Django (or between Django and developer applications). Note that we are not trying to switch to forcing everybody to ''only'' use unicode strings. You will also be able to pass around bytestrings and Django will assume they are UTF-8 encoded (we have to make an assumption because there is no way to tell what the encoding is otherwise). This feature means that a large chunk of existing code that uses Django will continue to work unchanged. == Todo Items == The various pieces will be converted in roughly the following order: 1. Template rendering ('''Done''' in [4971]) 2. Database I/O ('''Done''' in [4971] for postgresql, postgresql_psycopg2, mysql, mysql_old and sqlite backends) * Needs testing for servers/tables that are not in UTF-8 or ASCII encoding. The theory is that the client connection for each backend ''should'' be automatically converting everything to UTF-8 or Unicode objects (depends on backend), but this needs verifying. ''(Ivan Sagalaev and Malcolm have tested this feature a fair bit with various database servers, but more stress tests would be nice.)'' 3. Model class support ('''Done''' in [5057]) 4. Form input encoding ('''Done''' in [5192]) 5. Other output methods: * syndication ('''Done''' in [5251]) * serialization ('''Done''' in [5248]) * Google sitemaps ('''Done''' in [5277]) 6. Audit other contrib modules (not all will require changes) (all '''Done''' as of [5274]). 7. Verifying and fixing all bugs mentioned in Trac. * This includes the 9 tickets that were merged in #2489 (all '''Done''', except for upgrading slug generation to be a little more useful with ''some'' foreign character sets.) We also need to look at the i18n support functions (in django.utils.translation): * Decide on usage of gettext() versus ugettext() in a number of places ('''Done'''. Recommended to use {{{ugettext()}}} and friends everywhere. In [5230] this change has been made throughout the framework). * Look at fixing ugettext_lazy() so that it acts as a better string and unicode proxy. ('''Done''': lots of fixes over a few commits to get this right, but it seems to be working well now, as least as s a unicode proxy, which is what is important.) Finally, some documentation needs to be written describing good practices for creating unicode-aware Django apps. == Porting Applications (The Quick Checklist) == One of the design goals of the Unicode branch is that very little significant changes to existing third-party code should be required. However, there are some things that developers should be aware of when writing applications designed to handle international input. A detailed list of things you might wish to think about when writing your code is in the {{{unicode.txt}}} file in the documentation directory. For the programmer on a deadline, here is the cheatsheet version (if you only use ASCII strings, none of these changes are necessary): ('''Note (25 May 2007):''' Early adopters will have seen five steps in this list. The all-important step number 3 was initially omitted.) 1. Change the {{{__str__}}} methods on your models to be {{{__unicode__}}} methods. Just change the name. Usually, nothing else will be needed. 2. Look for any {{{str()}}} calls in your code that operate on model fields. These should almost always be changed to {{{smart_unicode()}}} calls (which is imported from {{{django.utils.encoding}}}). In some cases, you may need to use {{{force_unicode()}}} (in the same module), but starting with a global change to {{{smart_unicode()}}} and then checking for problems is the "quick fix" way. (Details of the differences between the two functions are in {{{unicode.txt}}}.) 3. Change your string literals that include Python format characters to be unicode strings. For example, change this: {{{ #!python formal_name = '%s %s %s' % (title, firstname, surname) # old version }}} to this: {{{ #!python formal_name = u'%s %s %s' % (title, firstname, surname) # new version }}} This is useful for two reasons. Firstly, if the parameters contain non-ASCII characters, you won't have an exception raised. Secondly, if any of the parameters are objects, Python will automaticay call their {{{__unicode__}}} method and convert them to the right type. The "before" code would have resulted in the {{{__str__}}} method being called instead. Of course, this step is only a good idea if you are interpolating unicode strings. If your parameters are bytestrings, they will not automatically be decoded to unicode strings before being interpolated (Python cannot read your mind). Use {{{smart_unicode()}}} for that purpose. * '''Warning for Python 2.3:''' There is a bug in the way Python 2.3 does string interpolation for unicode strings that you should be aware of if your code has to work with that version of Python. In the second line of code, above, if any of the parameters are non-basestring objects, Python will call the {{{__str__}}} method on the object, not the {{{__unicode__}}} method! So, for Python 2.3-compatible code, you would need to write something like {{{ #!python some_string = u'This is your object: %s' % unicode(some_object) }}} Note the explicit call to {{{unicode()}}} here to force the object to be the right type. 4. Use the unicode versions of the {{{django.utils.translation.*}}} functions. Replace {{{gettext}}} and {{{ngettext}}} with {{{ugettext}}} and {{{ungettext}}} respectively. There are also {{{ugettext_lazy}}} and {{{ungettext_lazy}}} functions if you use the lazy versions. 5. Make sure your database can store all the data you will send to it. Usually, this means ensuring it is using UTF-8 (or similar) encoding internally. 6. Use the {{{FILE_CHARSET}}} setting if your on-disk template files and initial SQL files are not UTF-8 encoded. That is all. Enjoy!