Code


Version 50 (modified by anonymous, 6 years ago) (diff)

--

Part of DjangoSpecifications

Django threading review

Relevant tickets: #5632, #6950, #1442.

Relevant discussions: http://groups.google.com/group/django-users/browse_frm/thread/a7d42475b66530bd, http://groups.google.com/group/django-developers/browse_thread/thread/fbcfa88c997d1bb3, http://groups.google.com/group/django-developers/browse_thread/thread/905f79e350525c95

Only easy-to-identify globals have been reviewed, a related task is to identify other components not listed here that may have threading issues.

Introduction

The main pattern of global variable usage in Django is use var if initialized (read), otherwise initialize (write), where var can be a single value or element of a data structure. This is generally not thread-safe.

"Not thread-safe" has two broad subcategories in this case:

  • inefficiencies due to initialization calls meant to occur only once occurring more than once (including memoize decorator),
  • errors due to incomplete initialization.

Paradoxically, accepting the "inefficiencies" results generally in more efficient execution as:

  • mutual exclusion (locking) is expensive,
  • the inefficient case is not a certain event, it only occurs with a probability that can be quite low,
  • locking would needlessly penalize single-threaded execution.

Thus, lock-free algorithms should be always preferred and the inefficient case tolerated -- unless the inefficiency is highly probable and results in overhead that is considerably greater than with locking or has harmful side-effects.

Inefficiencies

When evaluating the inefficiencies, their impact should be considered as outlined above: probability, overhead and side effects. The duplicated case,

1. thread 1: if not foo: true, needs initializing
2. thread 2: if not foo: true, needs initializing
3. thread 1: initialize foo
4. thread 2: initialize foo

is not that common. No code where duplicated call would cause a considerable overhead or harmful side-effects was found during the review, so the "inefficiency issues" are really non-issues and listed below only for reference. In the multi-process case each process does initialize foo individually anyway.

Errors due to incomplete initialization

Incomplete initialization problem is the following:

foo = []

1. thread 1: if not foo: true, needs initializing
2. thread 1: foo.append(x)
3. thread 2: if not foo: false, does not need initializing --> use the incomplete foo
4. thread 1: foo.append(y)
5. thread 2: use fully initialized foo

Incomplete initialization errors can generally be avoided by using full assignment instead of elementwise modification; additionally, to make sure no further modifications of a list can happen, tuples should be used instead of lists (inspired by source:django/trunk/django/template/context.py@7415#L86):

BAD

foo = []
def bar():
   ...
   if not foo:
      for x in y:
	 foo.append(x)

BETTER

foo = None
def bar():
   ...
   global foo
   if foo is None:
      tmp = []
      for x in y:
	 tmp.append(x)
      foo = tuple(tmp) # assignment is "atomic", tuple is const-correct and more efficient

There is at least one location in code that uses the "bad" algorithm.

Globals

There are four types of globals:

  1. globals that are initialized at module level and never modified later (THREAD-SAFE),
  2. global mutable data structures that are initialized at module level and whose elements are modified with module level code, but never modified later (PROBABLY THREAD-SAFE, although elementwise modification at module level is not thread-safe per se, the module is most likely cached before threads get access to it)
  3. global mutable data structures (lists and dictionaries, also instances) that are initialized at module level but whose elements are modified in functions and that are accessed without using the global keyword (NOT THREAD-SAFE),
  4. globals initialized in functions by using the global keyword (NOT THREAD-SAFE),

Modules' use of globals

Note that only lists, tuples and globals accessed with the global keyword have been reviewed. Global class instances (e.g. registries) and class variable access without the __class__ keyword are missing. The latter are probably impossible to catch with grep.

See below for raw grep results.

ModuleGlobalsIncomplete initInefficiencies
settings and global_settings?MODULE LEVEL INIT, not reviewed
utils/_decimal.pylots, including codeMODULE LEVEL INIT, not reviewed
django/contrib/sites/models.pySITE_CACHEOKone db hit intended, more than one possible
django/template/context.py_standard_context_processorsOKduplicated module loading with __import__ possible
django/template/__init__.pyinvalid_var_format_string, libraries, builtinsOKduplicated module loading with __import__ possible
django/template/loader.pytemplate_source_loadersPROBLEM, see #6950, patch attachedduplicated module loading with __import__ possible
django/template/loaders/app_directories.pyapp_template_dirsMODULE LEVEL INIT
django/utils/translation/trans_real.py_accepted, _active, _default, _translationsOKexplicit threading support, no inefficiencies
django/core/urlresolvers.py_callable_cache, _resolver_cacheOK, memoize decoratorduplicated module loading with __import__ possible
django/core/serializers/__init__.py_serializersPROBLEM, incomplete init in _load_serializers
django/db/models/fields/related.pypending_lookupsOK?, needs further reviewappend() in add_lazy_relation() can add duplicated values, which may or may not confuse pop() in do_pending_lookups()
django/db/transaction.pydirty, stateOKexplicit threading support, no inefficiencies
django/dispatch/dispatcher.pyconnections, senders, sendersBacknot reviewed

Problems

  1. django/template/loader.py: the "bad" algorithm above, patch attached to #6950
  2. django/core/serializers/__init__.py: _load_serializers() is unsafe, patch attached to #FIXME:
    1. thread 1: if not _serializers: true --> _load_serializers(), enter for loop
    2. thread 1: register_serializer(x)
    3. thread 2: if not _serializers: false --> use incomplete _serializers
    3. thread 1: register_serializer(y)
    
  3. django/db/models/fields/related.py: append() in add_lazy_relation() can add duplicated values, which may or may not confuse pop() in do_pending_lookups()

Class attributes

Class attributes are shared between instances and thus between threads as well (as module-level classes are just global class objects).

The behaviour is similar to globals: in similar manner to the global keyword in functions, explicit class specifier foo.__class__.bar is required for setting class variable bar from instance foo, otherwise a instance scope variable will be created that hides the class scope variable.

(As this may not be obvious, let me illustrate it:)

>>> class Foo(object): bar = 1
... 
>>> f = Foo()
>>> f.bar = 2
>>> Foo.bar
1
>>> f.bar
2
>>> f.__class__.bar
1
>>> f.__class__.bar = 3
>>> f.bar
2
>>> Foo.bar
3

As with globals, there are three types of class variables,

  1. class variables that are initialized when the class is defined and never modified later (THREAD-SAFE),
  2. mutable class level data structures that are initialized when the class is defined but whose elements are modified in methods and that are accessed without using the __class__ keyword (NOT THREAD-SAFE),
  3. class variables initialized in methods by using the __class__ keyword (NOT THREAD-SAFE),

Metaclasses -- think through the implications.

Raw grep results

Globals accessed with the global keyword

$ grep -r '^[[:space:]]*global ' django/ | egrep -v '(\.svn|\.html|\.css|\.pyc|_doctest\.py)' | sort | uniq

yields the following results

django/contrib/sites/models.py:        global SITE_CACHE
django/core/management/__init__.py:    global _commands
django/template/context.py:    global _standard_context_processors
django/template/__init__.py:                    global invalid_var_format_string
django/template/loader.py:    global template_source_loaders
django/utils/translation/trans_real.py:    global _accepted
django/utils/translation/trans_real.py:    global _active
django/utils/translation/trans_real.py:    global _default, _active
django/utils/translation/trans_real.py:        global _translations
django/utils/translation/trans_real.py:    global _translations

Out of these, django.core.management is not used in multi-threading context.

Global dictionaries

$ grep -r '^[[:alnum:]_]\+ *= *{' django | egrep -v '(\.svn|_doctest\.py)' | sort

yields the following results

django/conf/global_settings.py:ABSOLUTE_URL_OVERRIDES = {}
django/conf/global_settings.py:DATABASE_OPTIONS = {}          # Set to empty dictionary for default.
django/contrib/admin/utils.py:ROLES = {
django/contrib/admin/views/doc.py:DATA_TYPE_MAPPING = {
django/contrib/formtools/tests.py:test_data = {'field1': u'foo',
django/contrib/localflavor/ca/ca_provinces.py:PROVINCES_NORMALIZED = {
django/contrib/localflavor/in_/in_states.py:STATES_NORMALIZED = {
django/contrib/localflavor/us/us_states.py:STATES_NORMALIZED = {
django/contrib/sites/models.py:SITE_CACHE = {}
django/core/cache/__init__.py:BACKENDS = {
django/core/cache/__init__.py:DEPRECATED_BACKENDS = {
django/core/handlers/wsgi.py:STATUS_CODE_TEXT = {
django/core/serializers/__init__.py:BUILTIN_SERIALIZERS = {
django/core/serializers/__init__.py:_serializers = {}
django/core/servers/basehttp.py:_hop_headers = {
django/core/servers/fastcgi.py:FASTCGI_OPTIONS = {
django/core/urlresolvers.py:_callable_cache = {} # Maps view and url pattern names to their view functions.
django/core/urlresolvers.py:_resolver_cache = {} # Maps urlconf modules to RegexURLResolver instances.
django/db/backends/dummy/creation.py:DATA_TYPES = {}
django/db/backends/dummy/introspection.py:DATA_TYPES_REVERSE = {}
django/db/backends/mysql/creation.py:DATA_TYPES = {
django/db/backends/mysql/introspection.py:DATA_TYPES_REVERSE = {
django/db/backends/mysql_old/creation.py:DATA_TYPES = {
django/db/backends/mysql_old/introspection.py:DATA_TYPES_REVERSE = {
django/db/backends/oracle/creation.py:DATA_TYPES = {
django/db/backends/oracle/creation.py:REMEMBER = {}
django/db/backends/oracle/introspection.py:DATA_TYPES_REVERSE = {
django/db/backends/postgresql/creation.py:DATA_TYPES = {
django/db/backends/postgresql/introspection.py:DATA_TYPES_REVERSE = {
django/db/backends/postgresql_psycopg2/introspection.py:DATA_TYPES_REVERSE = {
django/db/backends/sqlite3/creation.py:DATA_TYPES = {
django/db/backends/sqlite3/introspection.py:BASE_DATA_TYPES_REVERSE = {
django/db/models/fields/related.py:pending_lookups = {}
django/db/models/query.py:LEGACY_ORDERING_MAPPING = {'ASC': '_', 'DESC': '-_', 'RANDOM': '?'}
django/db/transaction.py:dirty = {}
django/db/transaction.py:state = {}
django/dispatch/dispatcher.py:connections = {}
django/dispatch/dispatcher.py:senders = {}
django/dispatch/dispatcher.py:sendersBack = {}
django/template/__init__.py:libraries = {}
django/utils/dates.py:MONTHS = {
django/utils/dates.py:MONTHS_3 = {
django/utils/dates.py:MONTHS_3_REV = {
django/utils/dates.py:MONTHS_AP = { # month names in Associated Press style
django/utils/dates.py:WEEKDAYS = {
django/utils/dates.py:WEEKDAYS_ABBR = {
django/utils/dates.py:WEEKDAYS_REV = {
django/utils/_decimal.py:_condition_map = {ConversionSyntax:InvalidOperation,
django/utils/_decimal.py:_infinity_map = {
django/utils/simplejson/decoder.py:BACKSLASH = {
django/utils/simplejson/decoder.py:_CONSTANTS = {
django/utils/simplejson/encoder.py:ESCAPE_DCT = {
django/utils/termcolors.py:opt_dict = {'bold': '1', 'underscore': '4', 'blink': '5', 'reverse': '7', 'conceal': '8'}
django/utils/translation/trans_null.py:TECHNICAL_ID_MAP = {
django/utils/translation/trans_real.py:_accepted = {}
django/utils/translation/trans_real.py:_active = {}
django/utils/translation/trans_real.py:_translations = {}

Out of these, the following are read-only (i.e. not changed anywhere in code) or otherwise irrelevant: contrib/admin, formtools tests, localflavor mappings, core/cache, core/handlers, core/serializers/__init__.py:BUILTIN_SERIALIZERS, core/servers, db/backends, db/models/query.py, utils/dates.py, utils/_decimal.py, utils/simplejson, utils/termcolors.py, utils/translation/trans_null.py.

SITE_CACHE and everything in django.utils.translation.trans_real has already been listed under globals above.

_callable_cache and _resolver_cache in django/core/urlresolvers.py are used within the memoize decorator, result = func(*args) may be called more than once in utils/functional.py, but this should generally be a non-issue.

That leaves the following relevant global dicts not listed before:

django/core/serializers/__init__.py:_serializers = {}
django/db/models/fields/related.py:pending_lookups = {}
django/db/transaction.py:dirty = {}
django/db/transaction.py:state = {}
django/dispatch/dispatcher.py:connections = {}
django/dispatch/dispatcher.py:senders = {}
django/dispatch/dispatcher.py:sendersBack = {}
django/template/__init__.py:libraries = {}

Global lists

$ grep -r '^[[:alnum:]_]\+ *= *\[' django | egrep -v '(\.svn|_doctest\.py|__all__)' | sort

yields the following results

django/db/models/fields/__init__.py:BLANK_CHOICE_DASH = [("", "---------")]
django/db/models/fields/__init__.py:BLANK_CHOICE_NONE = [("", "None")]
django/template/__init__.py:builtins = []
django/template/loaders/app_directories.py:app_template_dirs = []
django/utils/_decimal.py:rounding_functions = [name for name in Decimal.__dict__.keys() if name.startswith('_round_')]
django/utils/_decimal.py:_signals = [Clamped, DivisionByZero, Inexact, Overflow, Rounded,
django/utils/html.py:DOTS = ['·', '*', '\xe2\x80\xa2', '•', '•', '•']
django/utils/html.py:LEADING_PUNCTUATION  = ['(', '<', '&lt;']
django/utils/html.py:TRAILING_PUNCTUATION = ['.', ',', ')', '>', '\n', '&gt;']
django/utils/simplejson/decoder.py:ANYTHING = [

Leaving out the irrelevant read-only ones, the following remain:

django/template/__init__.py:builtins = []
django/template/loaders/app_directories.py:app_template_dirs = []

As a matter of style, the read-only ones should really be tuples, not lists -- the 'say what you mean' idiom: if it shouldn't be modified, don't let it be by making it a tuple. Tuples are also marginally more efficient speed- and space-wise. There is a slight semantic distinction between lists and tuples though http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists/ . But as there are no constant lists in Python, tuples are the only way to be const-correct.

__class__ keyword used for accessing anything other than __name__

$ grep -r '__class__\.' django/ | egrep -v '(\.svn|\.html|\.css|\.pyc|_doctest\.py|__class__\.__name__)' | sort | uniq

yields the following results

django/contrib/auth/middleware.py:        request.__class__.user = LazyUser()
django/contrib/contenttypes/models.py:            ct = self.__class__._cache[id]
django/contrib/contenttypes/models.py:            ct = self.__class__._cache[key]
django/contrib/contenttypes/models.py:        self.__class__._cache.clear()
django/contrib/contenttypes/models.py:        self.__class__._cache[ct.id] = ct
django/contrib/contenttypes/models.py:        self.__class__._cache[key] = ct
django/db/models/base.py:        q = self.__class__._default_manager.filter(**kwargs).order_by((not is_next and '-' or '') + field.name, (not is_next and '-' or '') + self._meta.pk.name)
django/db/models/base.py:            raise self.DoesNotExist, "%s matching query does not exist." % self.__class__._meta.object_name
django/db/models/fields/__init__.py:                not instance.__class__._default_manager.filter(**{'%s__exact' % self.name: getattr(instance, self.attname)}):
django/dispatch/saferef.py:                del self.__class__._allInstances[ self.key ]
django/newforms/models.py:        opts = instance.__class__._meta
django/newforms/models.py:    opts = instance.__class__._meta