Version 22 (modified by 17 years ago) ( diff ) | ,
---|
Part of DjangoSpecifications
Threading improvements in Django
According to tickets #5632, #6950 and discussions http://groups.google.com/group/django-users/browse_frm/thread/a7d42475b66530bd, http://groups.google.com/group/django-developers/browse_thread/thread/fbcfa88c997d1bb3, at least the following components of Django are not entirely thread-safe:
- django.template.loader
- django.db.models
- django.contrib.sessions
This specification outlines the changes that need to be implemented to solve these issues.
A related task is to identify other components not mentioned here that have threading issues.
See #1442 and http://groups.google.com/group/django-developers/browse_thread/thread/905f79e350525c95 for threading discussions.
Globals
There are three types of globals:
- read-only globals that are never assigned to (THREAD-SAFE),
- globals assigned to in a function by using the
global
keyword (NOT THREAD-SAFE), - global mutable data structures (lists and dictionaries, also instances) that are assigned to at module level but whose elements are modified in functions and that are accessed without using the
global
keyword (NOT THREAD-SAFE unless never modified).
"Not thread-safe" has two broad subcategories:
- inefficiencies due to calls meant to occur only once occurring more than once (the general
if not foo: initialize foo
, includingmemoize
decorator), - errors due to incomplete initialization.
When evaluating the inefficiencies, their impact should be considered in terms of their probability and the overhead of duplicated call. The worst case,
1. thread 1: if not foo: needs initializing 2. thread 2: if not foo: needs initializing 3. thread 1: initialize foo 4. thread 2: initialize foo
is not that common and there are no large initialization overhead cases below, so the "inefficiency issues" are really non-issues. There is at least one bug though.
Incomplete initialization errors can generally be avoided by using full assignment instead of elementwise modification (that's how it is done in the source mostly):
BAD
foo = [] def bar(): ... if not foo: for x in y: foo.append(x)
BETTER
foo = [] def bar(): ... global foo if not foo: tmp = [] for x in y: tmp.append(x) foo = tuple(tmp) # assignment is "atomic", tuple is const-correct and more efficient
Modules' use of globals
See below for raw grep
results.
Module | Globals | Incomplete init | Inefficiencies |
settings and global_settings | ? | not reviewed | |
utils/_decimal.py | lots, including code | not reviewed | |
django/contrib/sites/models.py | SITE_CACHE | OK | one db hit intended, more than one possible |
django/template/context.py | _standard_context_processors | OK | duplicated module loading with __import__ possible
|
django/template/__init__.py | invalid_var_format_string, libraries, builtins | OK | duplicated module loading with __import__ possible
|
django/template/loader.py | template_source_loaders | PROBLEM, see #6950, patch attached | duplicated module loading with __import__ possible
|
django/template/loaders/app_directories.py | app_template_dirs | PROBABLY OK, appending to list at module level is not thread-safe, but the module is most likely cached before threads get access to it | |
django/utils/translation/trans_real.py | _accepted, _active, _default, _translations | continue review here | foo |
django/core/urlresolvers.py | _callable_cache, _resolver_cache | memoize decorator | |
django/core/serializers/__init__.py | _serializers | ||
django/db/models/fields/related.py | pending_lookups | ||
django/db/transaction.py | dirty, state | ||
django/dispatch/dispatcher.py | `connections, senders, sendersBack |
Raw grep
results
Globals accessed with the global
keyword
$ grep -r '^[[:space:]]*global ' django/ | egrep -v '(\.svn|\.html|\.css|\.pyc|_doctest\.py)' | sort | uniq
yields the following results
django/contrib/sites/models.py: global SITE_CACHE django/core/management/__init__.py: global _commands django/template/context.py: global _standard_context_processors django/template/__init__.py: global invalid_var_format_string django/template/loader.py: global template_source_loaders django/utils/translation/trans_real.py: global _accepted django/utils/translation/trans_real.py: global _active django/utils/translation/trans_real.py: global _default, _active django/utils/translation/trans_real.py: global _translations django/utils/translation/trans_real.py: global _translations
Out of these, django.core.management
is not used in multi-threading context.
Global dictionaries
$ grep -r '^[[:alnum:]_]\+ *= *{' django | egrep -v '(\.svn|_doctest\.py)' | sort
yields the following results
django/conf/global_settings.py:ABSOLUTE_URL_OVERRIDES = {} django/conf/global_settings.py:DATABASE_OPTIONS = {} # Set to empty dictionary for default. django/contrib/admin/utils.py:ROLES = { django/contrib/admin/views/doc.py:DATA_TYPE_MAPPING = { django/contrib/formtools/tests.py:test_data = {'field1': u'foo', django/contrib/localflavor/ca/ca_provinces.py:PROVINCES_NORMALIZED = { django/contrib/localflavor/in_/in_states.py:STATES_NORMALIZED = { django/contrib/localflavor/us/us_states.py:STATES_NORMALIZED = { django/contrib/sites/models.py:SITE_CACHE = {} django/core/cache/__init__.py:BACKENDS = { django/core/cache/__init__.py:DEPRECATED_BACKENDS = { django/core/handlers/wsgi.py:STATUS_CODE_TEXT = { django/core/serializers/__init__.py:BUILTIN_SERIALIZERS = { django/core/serializers/__init__.py:_serializers = {} django/core/servers/basehttp.py:_hop_headers = { django/core/servers/fastcgi.py:FASTCGI_OPTIONS = { django/core/urlresolvers.py:_callable_cache = {} # Maps view and url pattern names to their view functions. django/core/urlresolvers.py:_resolver_cache = {} # Maps urlconf modules to RegexURLResolver instances. django/db/backends/dummy/creation.py:DATA_TYPES = {} django/db/backends/dummy/introspection.py:DATA_TYPES_REVERSE = {} django/db/backends/mysql/creation.py:DATA_TYPES = { django/db/backends/mysql/introspection.py:DATA_TYPES_REVERSE = { django/db/backends/mysql_old/creation.py:DATA_TYPES = { django/db/backends/mysql_old/introspection.py:DATA_TYPES_REVERSE = { django/db/backends/oracle/creation.py:DATA_TYPES = { django/db/backends/oracle/creation.py:REMEMBER = {} django/db/backends/oracle/introspection.py:DATA_TYPES_REVERSE = { django/db/backends/postgresql/creation.py:DATA_TYPES = { django/db/backends/postgresql/introspection.py:DATA_TYPES_REVERSE = { django/db/backends/postgresql_psycopg2/introspection.py:DATA_TYPES_REVERSE = { django/db/backends/sqlite3/creation.py:DATA_TYPES = { django/db/backends/sqlite3/introspection.py:BASE_DATA_TYPES_REVERSE = { django/db/models/fields/related.py:pending_lookups = {} django/db/models/query.py:LEGACY_ORDERING_MAPPING = {'ASC': '_', 'DESC': '-_', 'RANDOM': '?'} django/db/transaction.py:dirty = {} django/db/transaction.py:state = {} django/dispatch/dispatcher.py:connections = {} django/dispatch/dispatcher.py:senders = {} django/dispatch/dispatcher.py:sendersBack = {} django/template/__init__.py:libraries = {} django/utils/dates.py:MONTHS = { django/utils/dates.py:MONTHS_3 = { django/utils/dates.py:MONTHS_3_REV = { django/utils/dates.py:MONTHS_AP = { # month names in Associated Press style django/utils/dates.py:WEEKDAYS = { django/utils/dates.py:WEEKDAYS_ABBR = { django/utils/dates.py:WEEKDAYS_REV = { django/utils/_decimal.py:_condition_map = {ConversionSyntax:InvalidOperation, django/utils/_decimal.py:_infinity_map = { django/utils/simplejson/decoder.py:BACKSLASH = { django/utils/simplejson/decoder.py:_CONSTANTS = { django/utils/simplejson/encoder.py:ESCAPE_DCT = { django/utils/termcolors.py:opt_dict = {'bold': '1', 'underscore': '4', 'blink': '5', 'reverse': '7', 'conceal': '8'} django/utils/translation/trans_null.py:TECHNICAL_ID_MAP = { django/utils/translation/trans_real.py:_accepted = {} django/utils/translation/trans_real.py:_active = {} django/utils/translation/trans_real.py:_translations = {}
Out of these, the following are read-only (i.e. not changed anywhere in code) or otherwise irrelevant: contrib/admin, formtools tests, localflavor mappings
, core/cache, core/handlers, core/serializers/__init__.py:BUILTIN_SERIALIZERS
, core/servers, db/backends, db/models/query.py, utils/dates.py
, utils/_decimal.py, utils/simplejson, utils/termcolors.py
, utils/translation/trans_null.py
.
SITE_CACHE
and everything in django.utils.translation.trans_real
has already been listed under globals
above.
_callable_cache
and _resolver_cache
in django/core/urlresolvers.py are used within the memoize decorator, result = func(*args)
may be called more than once in utils/functional.py
, but this should generally be a non-issue.
That leaves the following relevant global dicts not listed before:
django/core/serializers/__init__.py:_serializers = {} django/db/models/fields/related.py:pending_lookups = {} django/db/transaction.py:dirty = {} django/db/transaction.py:state = {} django/dispatch/dispatcher.py:connections = {} django/dispatch/dispatcher.py:senders = {} django/dispatch/dispatcher.py:sendersBack = {} django/template/__init__.py:libraries = {}
Global lists
$ grep -r '^[[:alnum:]_]\+ *= *\[' django | egrep -v '(\.svn|_doctest\.py|__all__)' | sort
yields the following results
django/db/models/fields/__init__.py:BLANK_CHOICE_DASH = [("", "---------")] django/db/models/fields/__init__.py:BLANK_CHOICE_NONE = [("", "None")] django/template/__init__.py:builtins = [] django/template/loaders/app_directories.py:app_template_dirs = [] django/utils/_decimal.py:rounding_functions = [name for name in Decimal.__dict__.keys() if name.startswith('_round_')] django/utils/_decimal.py:_signals = [Clamped, DivisionByZero, Inexact, Overflow, Rounded, django/utils/html.py:DOTS = ['·', '*', '\xe2\x80\xa2', '•', '•', '•'] django/utils/html.py:LEADING_PUNCTUATION = ['(', '<', '<'] django/utils/html.py:TRAILING_PUNCTUATION = ['.', ',', ')', '>', '\n', '>'] django/utils/simplejson/decoder.py:ANYTHING = [
Leaving out the irrelevant read-only ones, the following remain:
django/template/__init__.py:builtins = [] django/template/loaders/app_directories.py:app_template_dirs = []
As a matter of style, the read-only ones should really be tuples, not lists -- the 'say what you mean' idiom: if it shouldn't be modified, don't let it be by making it a tuple. Tuples are also marginally more efficient speed- and space-wise. There is a slight semantic distinction between lists and tuples though http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists/ . But as there are no constant lists in Python, tuples are the only way to be const-correct.