Changes between Version 21 and Version 22 of UnicodeBranch


Ignore:
Timestamp:
May 24, 2007, 6:09:47 AM (18 years ago)
Author:
Malcolm Tredinnick
Comment:

Removed "things to consider", since it's in the documentation now.

Legend:

Unmodified
Added
Removed
Modified
  • UnicodeBranch

    v21 v22  
    5858One of the design goals of the Unicode branch is that very little significant changes to existing third-party code should be required. However, there are some things that developers should be aware of when writing applications designed to handle international input.
    5959
    60 A detailed list of things you might wish to think about when writing your code is given below. However, for the programmer on a deadline, here is the cheatsheet version (if you only use ASCII strings, none of these changes are necessary):
     60A detailed list of things you might wish to think about when writing your code is in the unicode.txt file in the documentation directory. For the programmer on a deadline, here is the cheatsheet version (if you only use ASCII strings, none of these changes are necessary):
    6161
    6262 1. Change the {{{__str__}}} methods on your models to be {{{__unicode__}}} methods. Just change the name. Usually, nothing else will be needed.
     
    7171
    7272That is all. Enjoy!
    73 
    74 == Things To Consider When Writing Applications ==
    75 
    76 '''** This section is no doubt incomplete. User experiences are welcome. If you discover something that is necessary to change, please add a bullet-point to the list (although we may edit the list periodically to be more coherent). **'''
    77 
    78 === String Encoding ===
    79 
    80  * In many cases, Django will convert any bytestrings passed to functions, such as filter functions, into unicode strings. All bytestrings, with the exception of form inputs and data read from files, are assumed to be UTF-8 encoded. Internal bytestrings that are not valid UTF-8 will cause fatal exceptions (because {{{my_string.decode('utf-8')}}} will fail).
    81 
    82  * Template files read from disk may be in an encoding that is not related to the output encoding or UTF-8. To specify the on-disk file encoding, use the `FILE_CHARSET` setting, which is new in the Unicode branch.
    83 
    84  * String data read from the database will be converted directly to unicode strings. So model attributes based on text fields (!TextField, !CharField, etc) will be unicode strings.
    85 
    86  * Field sizes for text fields such as !TextField and !CharField are specified in terms of characters, not the number of bytes used in the encoding in the database. All databases supported by Django can handle this (i.e. their ''VARCHAR'' fields are sized in terms of characaters and can store unicode characters). So you do '''not''' need to worry about how many bytes the encoded version of your data will take up when working with lengths.
    87 
    88  * You might find the functions {{{django.utils.encoding.smart_str()}}} and {{{django.utils.encoding.smart_unicode()}}} useful in your application code. Particularly the latter is handy: it takes a bytestring or unicode string and returns a unicode string. It also knows to convert objects with a {{{__unicode___}}} or {{{__str__}}} method into unicode strings. So if you have a string that is either a bytestring or unicode and you wish to make it uniform -- always a unicode string -- call {{{smart_unicode()}}} on the object.
    89 
    90 === Databases ===
    91 
    92  * Make sure that your database tables support an encoding that can hold all the data you are going to send to it. For example, if you may possibly be sending Chinese characters to the database, using the Russian KOI8-R encoding is going to cause errors. Django does not need to know what encoding your database uses, since the Python database wrappers take care of that. However, you should ensure your database is configured to handle the data you wish to send it. Generally, using a UTF-8 encoding for your tables is the simplest solution.
    93     * '''TODO''': Write up how to set and check this information for MySQL, PostgreSQL and SQLite.
    94 
    95 === Models ===
    96 
    97  * As mentioned previously, all model attributes retrieved from the database will be unicode strings.
    98 
    99  * If you are supporting international data, it is not safe to return the value of a field directly in your model's {{{__str__}}} method (in Python, {{{__str___}}} will always coerce the result to a bytestring object, even if you return a unicode string from the function). There are two possibilities here:
    100     * The simplest solution is to replace any {{{__str__}}} methods with a {{{__unicode__}}} method. This method returns a unicode string, so you can safely write
    101 {{{
    102 #!python
    103 class MyModel(models.Model):
    104     name = models.CharField(maxlength=50)
    105     ...
    106     def __unicode__(self):
    107         return self.name
    108 }}}
    109     The default {{{models.Model.__str__}}} method will call your model's {{{__unicode__}}}, if it exists, and then convert the result to UTF-8. So this single change should be transparent to the rest of your code.
    110     * Alternatively, if you want to explicitly write the {{{__str__}}} method for your model, it '''must''' return a UTF-8 encoded bytestring. No other encoding is acceptable here (certainly '''not''' {{{settings.DEFAULT_CHARSET}}}), because the result of calling {{{str()}}} on a model is used in more places than just template output.
Back to Top