Changes between Version 80 and Version 81 of DjangoSpecifications/Core/Threading


Ignore:
Timestamp:
Mar 9, 2010, 6:44:19 PM (14 years ago)
Author:
James Bennett
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DjangoSpecifications/Core/Threading

    v80 v81  
    1 ''Part of DjangoSpecifications''
    2 
    3 [[PageOutline]]
    4 
    5 = Summary =
    6 
    7 == If you find a threading bug, please file a ticket with threading keyword. ==
    8 
    9 '''Django core is generally threadsafe as of 1.0.3 / 1.1'''.
    10 
    11 However, there are certain issues you have to keep in mind:
    12  1. `QuerySet`s are known not to be thread-safe, see #11906. Usually that does not pose problems as they are (or should be) not shared between threads in Django. The exception to that rule is the use of exotic global/class-level/shared instance variable querysets in your own code (e.g. when using the ORM outside of the Django dispatch system), where you are assumed to know what you are doing and protect them appropriately anyway.
    13  1. There is an edge case that can result in a deadlock during middleware loading, see #11193. This will trigger only when the setup is invalid and won't harm you otherwise.
    14 
    15 '''Note that 1.0.2 has two known threading bugs, #10470 and #10472.'''
    16 
    17 As of 1.2, you should read http://docs.djangoproject.com/en/dev/howto/custom-template-tags/#thread-safety-considerations to keep your custom template tags thread-safe.
    18 
    19 = Django threading review =
    20 
    21 Relevant tickets: #5632, #6950, #1442, #7676, #10470, #10472.
    22 
    23 Relevant discussions: http://groups.google.com/group/django-users/browse_frm/thread/a7d42475b66530bd, http://groups.google.com/group/django-developers/browse_thread/thread/fbcfa88c997d1bb3,
    24 http://groups.google.com/group/django-developers/browse_thread/thread/905f79e350525c95
    25 
    26 Only easy-to-identify globals have been reviewed, a related task is to identify other components not listed here that may have threading issues.
    27 
    28 The review was done before qs-rf merge, class attrib review is incomplete.
    29 
    30 == Introduction ==
    31 
    32 The main pattern of global variable usage in Django is ''use `var` if initialized (read), otherwise initialize (write)'', where `var` can be a single value or element of a data structure. This is generally not thread-safe.
    33 
    34 "Not thread-safe" has two broad subcategories in this case:
    35  * inefficiencies due to initialization calls meant to occur only once occurring more than once (including `memoize` decorator),
    36  * errors due to incomplete initialization.
    37 
    38 Paradoxically, accepting the "inefficiencies" results generally in more ''efficient'' execution as:
    39  * mutual exclusion (locking) is expensive,
    40  * the inefficient case is not a certain event, it only occurs with a probability that can be quite low,
    41  * locking would needlessly penalize single-threaded execution.
    42 
    43 Thus, lock-free algorithms should be always preferred and the inefficient case tolerated -- unless the inefficiency is highly probable and results in overhead that is considerably greater than with locking or has harmful side-effects.
    44 
    45 === Inefficiencies ===
    46 
    47 When evaluating the inefficiencies, their impact should be considered as outlined above: probability, overhead and side effects. The duplicated case,
    48 {{{
    49 1. thread 1: if not foo: true, needs initializing
    50 2. thread 2: if not foo: true, needs initializing
    51 3. thread 1: initialize foo
    52 4. thread 2: initialize foo
    53 }}}
    54 is not that common. No code where duplicated call would cause a considerable overhead or harmful side-effects was found during the review, so '''the "inefficiency issues" are really non-issues''' and listed below only for reference. In the multi-process case each process does initialize `foo` individually anyway.
    55 
    56 Note that as modules are cached in the Python interpreter (see `sys.modules`), duplicate `__import__` calls don't re-import modules.
    57 
    58 === Errors due to incomplete initialization ===
    59 
    60 Incomplete initialization problem is the following:
    61 {{{
    62 foo = []
    63 
    64 1. thread 1: if not foo: true, needs initializing
    65 2. thread 1: foo.append(x)
    66 3. thread 2: if not foo: false, does not need initializing --> use the incomplete foo
    67 4. thread 1: foo.append(y)
    68 5. thread 1: use fully initialized foo
    69 }}}
    70 
    71 Incomplete initialization errors can generally be avoided by using full assignment instead of elementwise modification; additionally, to make sure no further modifications of a list can happen, tuples should be used instead of lists (inspired by source:django/trunk/django/template/context.py@7415#L86):
    72 
    73 '''WRONG'''
    74 {{{
    75 foo = []
    76 def init_foo():
    77    ...
    78    if not foo:
    79       for x in y:
    80          foo.append(x)
    81 }}}
    82 
    83 '''RIGHT'''
    84 {{{
    85 foo = None
    86 def init_foo():
    87    ...
    88    global foo
    89    if foo is None:
    90       tmp = []
    91       for x in y:
    92          tmp.append(x)
    93       foo = tuple(tmp) # assignment is "atomic", tuple is const-correct and more efficient
    94 }}}
    95 
    96 === Lock handling ===
    97 
    98 Locks should always be acquired in exception-safe manner, otherwise the system will deadlock when an exception is raised while a lock is held.
    99 
    100 Consider the following example (#11193):
    101 
    102 '''WRONG'''
    103 {{{
    104         if self._request_middleware is None:
    105             self.initLock.acquire()
    106             # Check that middleware is still uninitialised.
    107             if self._request_middleware is None:
    108                 self.load_middleware()
    109             self.initLock.release()
    110 }}}
    111 
    112 This will deadlock when `self.load_middleware()` raises an exception (the lock will never be released).
    113 
    114 '''RIGHT'''
    115 {{{
    116         if self._request_middleware is None:
    117             self.initLock.acquire()
    118             try:
    119                 # Check that middleware is still uninitialised.
    120                 if self._request_middleware is None:
    121                     self.load_middleware()
    122             except:
    123                 # possibly unload whatever middleware didn't fail here
    124                 # set the guard back to uninitialized state
    125                 self._request_middleware = None
    126                 raise
    127             finally:
    128                 self.initLock.release()
    129 }}}
    130 
    131 == Globals ==
    132 
    133 There are four types of globals:
    134  1. globals that are initialized at module level and never modified later (THREAD-SAFE),
    135  1. global mutable data structures that are initialized at module level and whose elements are modified with module level code, but never modified later (PROBABLY THREAD-SAFE, although elementwise modification at module level is not thread-safe ''per se'', the module is most likely cached ''before'' threads get access to it)
    136  1. global mutable data structures (lists and dictionaries, also instances) that are initialized at module level but whose elements are modified in functions and that are accessed without using the `global` keyword (NOT THREAD-SAFE),
    137  1. globals initialized in functions by using the `global` keyword (NOT THREAD-SAFE),
    138 
    139 === Modules' use of globals ===
    140 
    141 Note that only lists, tuples and globals accessed with the `global` keyword have been reviewed. Global class instances (e.g. registries) and class variable access without the `__class__` keyword are missing. The latter are probably impossible to catch with `grep`.
    142 
    143 See below for raw `grep` results.
    144 
    145 ||'''Module'''||'''Globals'''||'''Incomplete init'''||'''Inefficiencies'''||
    146 ||settings and global_settings||?||MODULE LEVEL INIT, not reviewed||||
    147 ||utils/_decimal.py||lots, including code||MODULE LEVEL INIT, not reviewed||||
    148 ||django/contrib/sites/models.py||`SITE_CACHE`||OK||one db hit intended, more than one possible||
    149 ||django/template/context.py||`_standard_context_processors`||OK||double `__import__`||
    150 ||`django/template/__init__.py`||`invalid_var_format_string, libraries, builtins`||OK||double `__import__`||
    151 ||django/template/loader.py||`template_source_loaders`||Fixed with #6950||double `__import__`||
    152 ||django/template/loaders/app_directories.py||`app_template_dirs`||MODULE LEVEL INIT||||
    153 ||django/utils/translation/trans_real.py||`_accepted, _active, _default, _translations`||OK||explicit threading support, no inefficiencies||
    154 ||django/core/urlresolvers.py||`_callable_cache, _resolver_cache`||OK, `memoize` decorator||double `__import__`||
    155 ||`django/core/serializers/__init__.py`||`_serializers`||Fixed with #7676||||
    156 ||django/db/models/fields/related.py||`pending_lookups`||OK?, needs further review||`append()` in `add_lazy_relation()` can add duplicated values, which may or may not confuse `pop()` in `do_pending_lookups()`||
    157 ||django/db/transaction.py||`dirty, state`||OK||explicit threading support, no inefficiencies||
    158 ||django/dispatch/dispatcher.py||`connections, senders, sendersBack`||not reviewed||||
    159 
    160 === Problems ===
    161 
    162  1. `django/template/loader.py`: the "wrong" algorithm above, fixed with #6950
    163  1. `django/core/serializers/__init__.py`: `_load_serializers()` is unsafe, fixed with #7676:
    164 {{{
    165 1. thread 1: if not _serializers: true --> _load_serializers(), enter for loop
    166 2. thread 1: register_serializer(x)
    167 3. thread 2: if not _serializers: false --> use incomplete _serializers
    168 3. thread 1: register_serializer(y)
    169 }}}
    170  1. `django/db/models/fields/related.py`: `append()` in `add_lazy_relation()` can add duplicated values, which may or may not confuse `pop()` in `do_pending_lookups()`
    171 
    172 
    173 == Class attributes ==
    174 
    175 Class attributes are shared between instances and thus between threads as well
    176 (as module-level classes are just global class objects).
    177 
    178 The behaviour is similar to globals: in similar manner to the global keyword in
    179 functions, explicit class specifier `foo.__class__.bar` is required for setting
    180 class variable `bar` from instance `foo`, otherwise a instance scope variable
    181 will be created that hides the class scope variable.
    182 
    183 (As this may not be obvious, let me illustrate it:)
    184 {{{
    185 >>> class Foo(object): bar = 1
    186 ...
    187 >>> f = Foo()
    188 >>> f.bar = 2
    189 >>> Foo.bar
    190 1
    191 >>> f.bar
    192 2
    193 >>> f.__class__.bar
    194 1
    195 >>> f.__class__.bar = 3
    196 >>> f.bar
    197 2
    198 >>> Foo.bar
    199 3
    200 }}}
    201 
    202 As with globals, there are three types of class variables,
    203  1. class variables that are initialized when the class is defined and never modified later (THREAD-SAFE),
    204  1. mutable class level data structures that are initialized when the class is defined but whose elements are modified in methods and that are accessed without using the `__class__` keyword (NOT THREAD-SAFE),
    205  1. class variables initialized in methods by using the `__class__` keyword or directly by `Classname.varname` (NOT THREAD-SAFE),
    206 
    207 Metaclasses -- think through the implications.
    208 
    209 == Raw `grep` results ==
    210 
    211 === Globals accessed with the `global` keyword ===
    212 
    213 {{{
    214 $ grep -r '^[[:space:]]*global ' django/ | egrep -v '(\.svn|\.html|\.css|\.pyc|_doctest\.py)' | sort | uniq
    215 }}}
    216 
    217 yields the following results
    218 
    219 {{{
    220 django/contrib/sites/models.py:        global SITE_CACHE
    221 django/core/management/__init__.py:    global _commands
    222 django/template/context.py:    global _standard_context_processors
    223 django/template/__init__.py:                    global invalid_var_format_string
    224 django/template/loader.py:    global template_source_loaders
    225 django/utils/translation/trans_real.py:    global _accepted
    226 django/utils/translation/trans_real.py:    global _active
    227 django/utils/translation/trans_real.py:    global _default, _active
    228 django/utils/translation/trans_real.py:        global _translations
    229 django/utils/translation/trans_real.py:    global _translations
    230 }}}
    231 
    232 Out of these, `django.core.management` is not used in multi-threading context.
    233 
    234 === Global dictionaries ===
    235 
    236 {{{
    237 $ grep -r '^[[:alnum:]_]\+ *= *{' django | egrep -v '(\.svn|_doctest\.py)' | sort
    238 }}}
    239 
    240 yields the following results
    241 
    242 {{{
    243 django/conf/global_settings.py:ABSOLUTE_URL_OVERRIDES = {}
    244 django/conf/global_settings.py:DATABASE_OPTIONS = {}          # Set to empty dictionary for default.
    245 django/contrib/admin/utils.py:ROLES = {
    246 django/contrib/admin/views/doc.py:DATA_TYPE_MAPPING = {
    247 django/contrib/formtools/tests.py:test_data = {'field1': u'foo',
    248 django/contrib/localflavor/ca/ca_provinces.py:PROVINCES_NORMALIZED = {
    249 django/contrib/localflavor/in_/in_states.py:STATES_NORMALIZED = {
    250 django/contrib/localflavor/us/us_states.py:STATES_NORMALIZED = {
    251 django/contrib/sites/models.py:SITE_CACHE = {}
    252 django/core/cache/__init__.py:BACKENDS = {
    253 django/core/cache/__init__.py:DEPRECATED_BACKENDS = {
    254 django/core/handlers/wsgi.py:STATUS_CODE_TEXT = {
    255 django/core/serializers/__init__.py:BUILTIN_SERIALIZERS = {
    256 django/core/serializers/__init__.py:_serializers = {}
    257 django/core/servers/basehttp.py:_hop_headers = {
    258 django/core/servers/fastcgi.py:FASTCGI_OPTIONS = {
    259 django/core/urlresolvers.py:_callable_cache = {} # Maps view and url pattern names to their view functions.
    260 django/core/urlresolvers.py:_resolver_cache = {} # Maps urlconf modules to RegexURLResolver instances.
    261 django/db/backends/dummy/creation.py:DATA_TYPES = {}
    262 django/db/backends/dummy/introspection.py:DATA_TYPES_REVERSE = {}
    263 django/db/backends/mysql/creation.py:DATA_TYPES = {
    264 django/db/backends/mysql/introspection.py:DATA_TYPES_REVERSE = {
    265 django/db/backends/mysql_old/creation.py:DATA_TYPES = {
    266 django/db/backends/mysql_old/introspection.py:DATA_TYPES_REVERSE = {
    267 django/db/backends/oracle/creation.py:DATA_TYPES = {
    268 django/db/backends/oracle/creation.py:REMEMBER = {}
    269 django/db/backends/oracle/introspection.py:DATA_TYPES_REVERSE = {
    270 django/db/backends/postgresql/creation.py:DATA_TYPES = {
    271 django/db/backends/postgresql/introspection.py:DATA_TYPES_REVERSE = {
    272 django/db/backends/postgresql_psycopg2/introspection.py:DATA_TYPES_REVERSE = {
    273 django/db/backends/sqlite3/creation.py:DATA_TYPES = {
    274 django/db/backends/sqlite3/introspection.py:BASE_DATA_TYPES_REVERSE = {
    275 django/db/models/fields/related.py:pending_lookups = {}
    276 django/db/models/query.py:LEGACY_ORDERING_MAPPING = {'ASC': '_', 'DESC': '-_', 'RANDOM': '?'}
    277 django/db/transaction.py:dirty = {}
    278 django/db/transaction.py:state = {}
    279 django/dispatch/dispatcher.py:connections = {}
    280 django/dispatch/dispatcher.py:senders = {}
    281 django/dispatch/dispatcher.py:sendersBack = {}
    282 django/template/__init__.py:libraries = {}
    283 django/utils/dates.py:MONTHS = {
    284 django/utils/dates.py:MONTHS_3 = {
    285 django/utils/dates.py:MONTHS_3_REV = {
    286 django/utils/dates.py:MONTHS_AP = { # month names in Associated Press style
    287 django/utils/dates.py:WEEKDAYS = {
    288 django/utils/dates.py:WEEKDAYS_ABBR = {
    289 django/utils/dates.py:WEEKDAYS_REV = {
    290 django/utils/_decimal.py:_condition_map = {ConversionSyntax:InvalidOperation,
    291 django/utils/_decimal.py:_infinity_map = {
    292 django/utils/simplejson/decoder.py:BACKSLASH = {
    293 django/utils/simplejson/decoder.py:_CONSTANTS = {
    294 django/utils/simplejson/encoder.py:ESCAPE_DCT = {
    295 django/utils/termcolors.py:opt_dict = {'bold': '1', 'underscore': '4', 'blink': '5', 'reverse': '7', 'conceal': '8'}
    296 django/utils/translation/trans_null.py:TECHNICAL_ID_MAP = {
    297 django/utils/translation/trans_real.py:_accepted = {}
    298 django/utils/translation/trans_real.py:_active = {}
    299 django/utils/translation/trans_real.py:_translations = {}
    300 }}}
    301 
    302 Out of these, the following are read-only (i.e. not changed anywhere in code) or otherwise irrelevant: `contrib/admin, formtools tests, localflavor mappings`, `core/cache, core/handlers, core/serializers/__init__.py:BUILTIN_SERIALIZERS`, `core/servers, db/backends, db/models/query.py, utils/dates.py`, `utils/_decimal.py, utils/simplejson, utils/termcolors.py`, `utils/translation/trans_null.py`.
    303 
    304 `SITE_CACHE` and everything in `django.utils.translation.trans_real` has already been listed under `globals` above.
    305 
    306 `_callable_cache` and `_resolver_cache` in django/core/urlresolvers.py are used within the memoize decorator, `result = func(*args)` may be called more than once in `utils/functional.py`, but this should generally be a non-issue.
    307 
    308 That leaves the following relevant global dicts not listed before:
    309 {{{
    310 django/core/serializers/__init__.py:_serializers = {}
    311 django/db/models/fields/related.py:pending_lookups = {}
    312 django/db/transaction.py:dirty = {}
    313 django/db/transaction.py:state = {}
    314 django/dispatch/dispatcher.py:connections = {}
    315 django/dispatch/dispatcher.py:senders = {}
    316 django/dispatch/dispatcher.py:sendersBack = {}
    317 django/template/__init__.py:libraries = {}
    318 }}}
    319 
    320 === Global lists ===
    321 
    322 {{{
    323 $ grep -r '^[[:alnum:]_]\+ *= *\[' django | egrep -v '(\.svn|_doctest\.py|__all__)' | sort
    324 }}}
    325 
    326 yields the following results
    327 
    328 {{{
    329 django/db/models/fields/__init__.py:BLANK_CHOICE_DASH = [("", "---------")]
    330 django/db/models/fields/__init__.py:BLANK_CHOICE_NONE = [("", "None")]
    331 django/template/__init__.py:builtins = []
    332 django/template/loaders/app_directories.py:app_template_dirs = []
    333 django/utils/_decimal.py:rounding_functions = [name for name in Decimal.__dict__.keys() if name.startswith('_round_')]
    334 django/utils/_decimal.py:_signals = [Clamped, DivisionByZero, Inexact, Overflow, Rounded,
    335 django/utils/html.py:DOTS = ['·', '*', '\xe2\x80\xa2', '•', '•', '•']
    336 django/utils/html.py:LEADING_PUNCTUATION  = ['(', '<', '&lt;']
    337 django/utils/html.py:TRAILING_PUNCTUATION = ['.', ',', ')', '>', '\n', '&gt;']
    338 django/utils/simplejson/decoder.py:ANYTHING = [
    339 }}}
    340 
    341 Leaving out the irrelevant read-only ones, the following remain:
    342 {{{
    343 django/template/__init__.py:builtins = []
    344 django/template/loaders/app_directories.py:app_template_dirs = []
    345 }}}
    346 
    347 As a matter of style, the read-only ones should really be tuples, not lists -- the 'say what you mean' idiom: if it shouldn't be modified, don't let it be by making it a tuple. Tuples are also marginally more efficient speed- and space-wise. There is a slight semantic distinction between lists and tuples though http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists/ . But as there are no constant lists in Python, tuples are the only way to be const-correct.
    348 
    349 === `__class__` keyword used for accessing anything other than `__name__` ===
    350 
    351 {{{
    352 $ grep -r '__class__\.' django/ | egrep -v '(\.svn|\.html|\.css|\.pyc|_doctest\.py|__class__\.__name__)' | sort | uniq
    353 }}}
    354 
    355 yields the following results
    356 
    357 {{{
    358 django/contrib/auth/middleware.py:        request.__class__.user = LazyUser()
    359 django/contrib/contenttypes/models.py:            ct = self.__class__._cache[id]
    360 django/contrib/contenttypes/models.py:            ct = self.__class__._cache[key]
    361 django/contrib/contenttypes/models.py:        self.__class__._cache.clear()
    362 django/contrib/contenttypes/models.py:        self.__class__._cache[ct.id] = ct
    363 django/contrib/contenttypes/models.py:        self.__class__._cache[key] = ct
    364 django/db/models/base.py:        q = self.__class__._default_manager.filter(**kwargs).order_by((not is_next and '-' or '') + field.name, (not is_next and '-' or '') + self._meta.pk.name)
    365 django/db/models/base.py:            raise self.DoesNotExist, "%s matching query does not exist." % self.__class__._meta.object_name
    366 django/db/models/fields/__init__.py:                not instance.__class__._default_manager.filter(**{'%s__exact' % self.name: getattr(instance, self.attname)}):
    367 django/dispatch/saferef.py:                del self.__class__._allInstances[ self.key ]
    368 django/newforms/models.py:        opts = instance.__class__._meta
    369 django/newforms/models.py:    opts = instance.__class__._meta
    370 }}}
     1This page and several others were created by a wiki user who was not and is not affiliated with the Django project. Previous contents of this and other similar pages are not and should not be confused with [http://docs.djangoproject.com/ Django's own documentation], which remains the sole source of official documentation for the Django project.
Back to Top