Version 42 (modified by 17 years ago) ( diff ) | ,
---|
Part of DjangoSpecifications
Django threading review
Relevant tickets: #5632, #6950, #1442.
Relevant discussions: http://groups.google.com/group/django-users/browse_frm/thread/a7d42475b66530bd, http://groups.google.com/group/django-developers/browse_thread/thread/fbcfa88c997d1bb3, http://groups.google.com/group/django-developers/browse_thread/thread/905f79e350525c95
Only easy-to-identify globals have been reviewed, a related task is to identify other components not listed here that may have threading issues.
Globals
There are four types of globals:
- globals that are assigned to at module level and never modified later (THREAD-SAFE),
- globals that are assigned to at module level and whose elements are modified with module level code, but never modified later (PROBABLY THREAD-SAFE, although elementwise modification at module level is not thread-safe per se, the module is most likely cached before threads get access to it)
- global mutable data structures (lists and dictionaries, also instances) that are assigned to at module level but whose elements are modified in functions and that are accessed without using the
global
keyword (NOT THREAD-SAFE), - globals assigned to in functions by using the
global
keyword (NOT THREAD-SAFE),
"Not thread-safe" has two broad subcategories:
- inefficiencies due to calls meant to occur only once occurring more than once (the general
if not foo: initialize foo
, includingmemoize
decorator), - errors due to incomplete initialization.
Inefficiencies
When evaluating the inefficiencies, their impact should be considered in terms of their probability and overhead of the duplicated call. The duplicated case,
1. thread 1: if not foo: true, needs initializing 2. thread 2: if not foo: true, needs initializing 3. thread 1: initialize foo 4. thread 2: initialize foo
is not that common. No code where duplicated call would cause a considerable overhead or harmful side-effects was found during the review, so the "inefficiency issues" are really non-issues and listed below only for reference. In the multi-process case each process does initialize foo
individually anyway.
Errors due to incomplete initialization
Incomplete initialization problem is the following:
foo = [] 1. thread 1: if not foo: true, needs initializing 2. thread 1: foo.append(x) 3. thread 2: if not foo: false, does not need initializing --> use the incomplete foo 4. thread 1: foo.append(y) 5. thread 2: use fully initialized foo
Incomplete initialization errors can generally be avoided by using full assignment instead of elementwise modification; additionally, to make sure no further modifications of a list can happen, tuples should be used instead of lists (inspired by source:django/trunk/django/template/context.py@7415#L86):
BAD
foo = [] def bar(): ... if not foo: for x in y: foo.append(x)
BETTER
foo = None def bar(): ... global foo if foo is None: tmp = [] for x in y: tmp.append(x) foo = tuple(tmp) # assignment is "atomic", tuple is const-correct and more efficient
There is at least one location in code that uses the "bad" algorithm.
Modules' use of globals
Note that only lists, tuples and globals accessed with the global
keyword have been reviewed. Global class instances (e.g. registries) and class variable access without the __class__
keyword are missing. The latter are probably impossible to catch with grep
.
See below for raw grep
results.
Module | Globals | Incomplete init | Inefficiencies |
settings and global_settings | ? | MODULE LEVEL INIT, not reviewed | |
utils/_decimal.py | lots, including code | MODULE LEVEL INIT, not reviewed | |
django/contrib/sites/models.py | SITE_CACHE | OK | one db hit intended, more than one possible |
django/template/context.py | _standard_context_processors | OK | duplicated module loading with __import__ possible
|
django/template/__init__.py | invalid_var_format_string, libraries, builtins | OK | duplicated module loading with __import__ possible
|
django/template/loader.py | template_source_loaders | PROBLEM, see #6950, patch attached | duplicated module loading with __import__ possible
|
django/template/loaders/app_directories.py | app_template_dirs | MODULE LEVEL INIT | |
django/utils/translation/trans_real.py | _accepted, _active, _default, _translations | OK | explicit threading support, no inefficiencies |
django/core/urlresolvers.py | _callable_cache, _resolver_cache | OK, memoize decorator | duplicated module loading with __import__ possible
|
django/core/serializers/__init__.py | _serializers | PROBLEM, incomplete init in _load_serializers | |
django/db/models/fields/related.py | pending_lookups | OK?, needs further review | append() in add_lazy_relation() can add duplicated values, which may or may not confuse pop() in do_pending_lookups()
|
django/db/transaction.py | dirty, state | OK | explicit threading support, no inefficiencies |
django/dispatch/dispatcher.py | connections, senders, sendersBack | not reviewed |
Problems
django/template/loader.py
: the "bad" algorithm above, patch attached to #6950django/core/serializers/__init__.py
:_load_serializers()
is unsafe, patch attached to #FIXME:1. thread 1: if not _serializers: true --> _load_serializers(), enter for loop 2. thread 1: register_serializer(x) 3. thread 2: if not _serializers: false --> use incomplete _serializers 3. thread 1: register_serializer(y)
django/db/models/fields/related.py
:append()
inadd_lazy_relation()
can add duplicated values, which may or may not confusepop()
indo_pending_lookups()
Class attributes
Class attributes are shared between instances and thus between threads as well (as module-level classes are just global class objects).
The behaviour is similar to globals: in similar manner to the global keyword in
functions, explicit class specifier foo.__class__.bar
is required for setting
class variable bar
from instance foo
, otherwise a instance scope variable
will be created that hides the class scope variable.
(As this may not be obvious, let me illustrate it:)
>>> class Foo(object): bar = 1 ... >>> f = Foo() >>> f.bar = 2 >>> Foo.bar 1 >>> f.bar 2 >>> f.__class__.bar 1 >>> f.__class__.bar = 3 >>> f.bar 2 >>> Foo.bar 3
As with globals,
- class variables that are assigned to when the class is defined and never modified later (THREAD-SAFE),
- mutable class level data structures (lists and dictionaries, also instances) that are assigned to when the class is defined but whose elements are modified in methods and that are accessed without using the
__class__
keyword (NOT THREAD-SAFE), - class variables assigned to in methods by using the
__class__
keyword (NOT THREAD-SAFE),
Metaclasses -- think through the implications.
Raw grep
results
Globals accessed with the global
keyword
$ grep -r '^[[:space:]]*global ' django/ | egrep -v '(\.svn|\.html|\.css|\.pyc|_doctest\.py)' | sort | uniq
yields the following results
django/contrib/sites/models.py: global SITE_CACHE django/core/management/__init__.py: global _commands django/template/context.py: global _standard_context_processors django/template/__init__.py: global invalid_var_format_string django/template/loader.py: global template_source_loaders django/utils/translation/trans_real.py: global _accepted django/utils/translation/trans_real.py: global _active django/utils/translation/trans_real.py: global _default, _active django/utils/translation/trans_real.py: global _translations django/utils/translation/trans_real.py: global _translations
Out of these, django.core.management
is not used in multi-threading context.
Global dictionaries
$ grep -r '^[[:alnum:]_]\+ *= *{' django | egrep -v '(\.svn|_doctest\.py)' | sort
yields the following results
django/conf/global_settings.py:ABSOLUTE_URL_OVERRIDES = {} django/conf/global_settings.py:DATABASE_OPTIONS = {} # Set to empty dictionary for default. django/contrib/admin/utils.py:ROLES = { django/contrib/admin/views/doc.py:DATA_TYPE_MAPPING = { django/contrib/formtools/tests.py:test_data = {'field1': u'foo', django/contrib/localflavor/ca/ca_provinces.py:PROVINCES_NORMALIZED = { django/contrib/localflavor/in_/in_states.py:STATES_NORMALIZED = { django/contrib/localflavor/us/us_states.py:STATES_NORMALIZED = { django/contrib/sites/models.py:SITE_CACHE = {} django/core/cache/__init__.py:BACKENDS = { django/core/cache/__init__.py:DEPRECATED_BACKENDS = { django/core/handlers/wsgi.py:STATUS_CODE_TEXT = { django/core/serializers/__init__.py:BUILTIN_SERIALIZERS = { django/core/serializers/__init__.py:_serializers = {} django/core/servers/basehttp.py:_hop_headers = { django/core/servers/fastcgi.py:FASTCGI_OPTIONS = { django/core/urlresolvers.py:_callable_cache = {} # Maps view and url pattern names to their view functions. django/core/urlresolvers.py:_resolver_cache = {} # Maps urlconf modules to RegexURLResolver instances. django/db/backends/dummy/creation.py:DATA_TYPES = {} django/db/backends/dummy/introspection.py:DATA_TYPES_REVERSE = {} django/db/backends/mysql/creation.py:DATA_TYPES = { django/db/backends/mysql/introspection.py:DATA_TYPES_REVERSE = { django/db/backends/mysql_old/creation.py:DATA_TYPES = { django/db/backends/mysql_old/introspection.py:DATA_TYPES_REVERSE = { django/db/backends/oracle/creation.py:DATA_TYPES = { django/db/backends/oracle/creation.py:REMEMBER = {} django/db/backends/oracle/introspection.py:DATA_TYPES_REVERSE = { django/db/backends/postgresql/creation.py:DATA_TYPES = { django/db/backends/postgresql/introspection.py:DATA_TYPES_REVERSE = { django/db/backends/postgresql_psycopg2/introspection.py:DATA_TYPES_REVERSE = { django/db/backends/sqlite3/creation.py:DATA_TYPES = { django/db/backends/sqlite3/introspection.py:BASE_DATA_TYPES_REVERSE = { django/db/models/fields/related.py:pending_lookups = {} django/db/models/query.py:LEGACY_ORDERING_MAPPING = {'ASC': '_', 'DESC': '-_', 'RANDOM': '?'} django/db/transaction.py:dirty = {} django/db/transaction.py:state = {} django/dispatch/dispatcher.py:connections = {} django/dispatch/dispatcher.py:senders = {} django/dispatch/dispatcher.py:sendersBack = {} django/template/__init__.py:libraries = {} django/utils/dates.py:MONTHS = { django/utils/dates.py:MONTHS_3 = { django/utils/dates.py:MONTHS_3_REV = { django/utils/dates.py:MONTHS_AP = { # month names in Associated Press style django/utils/dates.py:WEEKDAYS = { django/utils/dates.py:WEEKDAYS_ABBR = { django/utils/dates.py:WEEKDAYS_REV = { django/utils/_decimal.py:_condition_map = {ConversionSyntax:InvalidOperation, django/utils/_decimal.py:_infinity_map = { django/utils/simplejson/decoder.py:BACKSLASH = { django/utils/simplejson/decoder.py:_CONSTANTS = { django/utils/simplejson/encoder.py:ESCAPE_DCT = { django/utils/termcolors.py:opt_dict = {'bold': '1', 'underscore': '4', 'blink': '5', 'reverse': '7', 'conceal': '8'} django/utils/translation/trans_null.py:TECHNICAL_ID_MAP = { django/utils/translation/trans_real.py:_accepted = {} django/utils/translation/trans_real.py:_active = {} django/utils/translation/trans_real.py:_translations = {}
Out of these, the following are read-only (i.e. not changed anywhere in code) or otherwise irrelevant: contrib/admin, formtools tests, localflavor mappings
, core/cache, core/handlers, core/serializers/__init__.py:BUILTIN_SERIALIZERS
, core/servers, db/backends, db/models/query.py, utils/dates.py
, utils/_decimal.py, utils/simplejson, utils/termcolors.py
, utils/translation/trans_null.py
.
SITE_CACHE
and everything in django.utils.translation.trans_real
has already been listed under globals
above.
_callable_cache
and _resolver_cache
in django/core/urlresolvers.py are used within the memoize decorator, result = func(*args)
may be called more than once in utils/functional.py
, but this should generally be a non-issue.
That leaves the following relevant global dicts not listed before:
django/core/serializers/__init__.py:_serializers = {} django/db/models/fields/related.py:pending_lookups = {} django/db/transaction.py:dirty = {} django/db/transaction.py:state = {} django/dispatch/dispatcher.py:connections = {} django/dispatch/dispatcher.py:senders = {} django/dispatch/dispatcher.py:sendersBack = {} django/template/__init__.py:libraries = {}
Global lists
$ grep -r '^[[:alnum:]_]\+ *= *\[' django | egrep -v '(\.svn|_doctest\.py|__all__)' | sort
yields the following results
django/db/models/fields/__init__.py:BLANK_CHOICE_DASH = [("", "---------")] django/db/models/fields/__init__.py:BLANK_CHOICE_NONE = [("", "None")] django/template/__init__.py:builtins = [] django/template/loaders/app_directories.py:app_template_dirs = [] django/utils/_decimal.py:rounding_functions = [name for name in Decimal.__dict__.keys() if name.startswith('_round_')] django/utils/_decimal.py:_signals = [Clamped, DivisionByZero, Inexact, Overflow, Rounded, django/utils/html.py:DOTS = ['·', '*', '\xe2\x80\xa2', '•', '•', '•'] django/utils/html.py:LEADING_PUNCTUATION = ['(', '<', '<'] django/utils/html.py:TRAILING_PUNCTUATION = ['.', ',', ')', '>', '\n', '>'] django/utils/simplejson/decoder.py:ANYTHING = [
Leaving out the irrelevant read-only ones, the following remain:
django/template/__init__.py:builtins = [] django/template/loaders/app_directories.py:app_template_dirs = []
As a matter of style, the read-only ones should really be tuples, not lists -- the 'say what you mean' idiom: if it shouldn't be modified, don't let it be by making it a tuple. Tuples are also marginally more efficient speed- and space-wise. There is a slight semantic distinction between lists and tuples though http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists/ . But as there are no constant lists in Python, tuples are the only way to be const-correct.
__class__
keyword used for accessing anything other than __name__
$ grep -r '__class__\.' django/ | egrep -v '(\.svn|\.html|\.css|\.pyc|_doctest\.py|__class__\.__name__)' | sort | uniq
yields the following results
django/contrib/auth/middleware.py: request.__class__.user = LazyUser() django/contrib/contenttypes/models.py: ct = self.__class__._cache[id] django/contrib/contenttypes/models.py: ct = self.__class__._cache[key] django/contrib/contenttypes/models.py: self.__class__._cache.clear() django/contrib/contenttypes/models.py: self.__class__._cache[ct.id] = ct django/contrib/contenttypes/models.py: self.__class__._cache[key] = ct django/db/models/base.py: q = self.__class__._default_manager.filter(**kwargs).order_by((not is_next and '-' or '') + field.name, (not is_next and '-' or '') + self._meta.pk.name) django/db/models/base.py: raise self.DoesNotExist, "%s matching query does not exist." % self.__class__._meta.object_name django/db/models/fields/__init__.py: not instance.__class__._default_manager.filter(**{'%s__exact' % self.name: getattr(instance, self.attname)}): django/dispatch/saferef.py: del self.__class__._allInstances[ self.key ] django/newforms/models.py: opts = instance.__class__._meta django/newforms/models.py: opts = instance.__class__._meta