Code

Opened 5 years ago

Closed 5 years ago

#10470 closed (fixed)

Core handlers are not threadsafe

Reported by: tdterry Owned by: tdterry
Component: Core (Other) Version: 1.0
Severity: Keywords: wsgi, basehandler, mod_python, threadsafe, threading
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

I have found an incomplete-initialization bug the WSGIHandler when running under Apache mpm_worker with mod_wsgi. Code review shows this is a potential problem with all of the handlers.

On the first request the WSGIHandler.__call__(...), we set up the middleware. There is a lock wrapping the load of the middleware, but the logic still allows incomplete initialization on other threads. Below is the block of code in question from django/core/handlers/wsgi.py (line 226):

if self._request_middleware is None:
    self.initLock.acquire()
    # Check that middleware is still uninitialised.
    if self._request_middleware is None:
        self.load_middleware()
    self.initLock.release()

Example:

  1. Initial start or restart of the Apache instance
  2. Thread T1 - a request comes in. self._request_middleware is None
  3. Thread T1 - acquires the self.initLock, calls self.load_middleware()
  4. Thread T2 - a request comes in (T1 is part of the way through self.load_middleware())
  5. Thread T2 - self._request_middleware is not None, but it's not completely loaded either, continues with the request
  6. Thread T1 - completes self.load_middleware(), release the lock

The attached patch changes BaseHandler.load_middleware() to do an atomic set of all four middleware lists. self._request_middleware is set last to signal the completion of middleware loading.

Attachments (3)

thread-safe-handlers.diff (2.4 KB) - added by tdterry 5 years ago.
Patch against r10030 to solve multi-thread issues with the core handlers
thread-safe-handlers_minimal-patch.diff (1.4 KB) - added by mrts 5 years ago.
Minimal changes that assure thread-safety
omission.diff (640 bytes) - added by mrts 5 years ago.
Omitted append to right list

Download all attachments as: .zip

Change History (24)

Changed 5 years ago by tdterry

Patch against r10030 to solve multi-thread issues with the core handlers

comment:1 Changed 5 years ago by tdterry

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Also, testing requires a multi-threaded setup and a good load test. These are not available in the framework, so I have not included a specific test for this patch.

comment:2 Changed 5 years ago by tdterry

  • Owner changed from nobody to tdterry
  • Status changed from new to assigned

comment:3 Changed 5 years ago by shai

May I suggest a minor improvement to the patch: Instead of assigning to the _request_middleware last, just use tuple assignment to assign them all together, i.e.

 self._request_middleware, self._view_middleware, self._response_middleware, self._exception_middleware = \
      _request_middleware,      _view_middleware,      _response_middleware,      _exception_middleware

This way you don't need to make assumptions about the users.

comment:4 Changed 5 years ago by mrts

  • Keywords threadsafe, threading added; threadsafe removed
  • Triage Stage changed from Unreviewed to Accepted

shai, tuple assignment is not atomic, so the suggestion does not help.

The patch looks OK, although, strictly speaking, temporary list is only required for _request_middleware as only that is checked in client code (i.e. only that serves as initialization guard, not the other lists).

It would make things a bit less opaque if a _middleware_not_loaded boolean property was added to BaseHandler (that would do the very same return self._request_middleware is None) and used throughout client code. It may make sense to avoid this for optimization though, method call would add quite a few bytecode instructions.

Changed 5 years ago by mrts

Minimal changes that assure thread-safety

comment:5 Changed 5 years ago by mrts

  • Triage Stage changed from Accepted to Unreviewed

On second thought, shouldn't initLock guard against this? tdterry, were you hit by this or did you just think that this a possible scenario?

AFAIK:

  1. Initial start or restart of the Apache instance
  2. Thread T1 - a request comes in. self._request_middleware is None
  3. Thread T1 - acquires the self.initLock, calls self.load_middleware()
  4. Thread T2 - a request comes in (T1 is part of the way through self.load_middleware())
  5. Thread T2 - waits for the initLock shared by all instances
  6. Thread T1 - completes self.load_middleware(), releases the lock
  7. Thread T2 - lock released, all middleware properly loaded

Looks good to me, pushing back to unreviewed.

comment:6 Changed 5 years ago by mrts

IMHO, can be closed as invalid.

comment:7 follow-up: Changed 5 years ago by shai

mrts, I will take your word on tuple assignment as I can't find any reference in a quick search.

However, your threading analysis is wrong: In your line 5, T2 doesn't wait for the initLock. This is because without the (original) patch, self._request_middleware is not None (django/core/handlers/wsgi.py line 226).

comment:8 Changed 5 years ago by Alex

Right so it gets past the first check, but then when it goes to acquite the lock it has to wait until the other thread is done, and then it fails the second test.

comment:9 Changed 5 years ago by shai

The point is that it doesn't wait for the lock, doesn't do the second check. It thinks that the middleware is already initialized, and just happily goes on to use it.

comment:10 Changed 5 years ago by Alex

That's not consistant with the documented behavior of the lock: http://docs.python.org/library/threading.html#threading.Lock.acquire

comment:11 Changed 5 years ago by shai

The code does not attempt to acquire the lock. It only does that when self._request_middleware is None. In the described scenario, it isn't. The whole block is skipped.

comment:12 Changed 5 years ago by shai

(FWIW: the Python disassembler agrees with mrts; tuple assignment is not atomic. Sorry about that one).

comment:13 Changed 5 years ago by Alex

Ok look at the code: http://code.djangoproject.com/browser/django/trunk/django/core/handlers/wsgi.py#L221

1) Is request middleware none if not go on
2) if it is acquire the lock -- block until we can
3) check if request middleware is none if not go on
4) load the middlware
5) release the lock

comment:14 Changed 5 years ago by Alex

Ok I think we've been arguing past each other, the issue isn't anything that occurs in that block of code, the issue is that once that block of code begins exceuting for real in the first thread if a second thread comes in it won't block on that, while it should.

comment:15 Changed 5 years ago by Alex

  • Triage Stage changed from Unreviewed to Accepted

comment:16 in reply to: ↑ 7 Changed 5 years ago by mrts

Replying to shai:

However, your threading analysis is wrong: In your line 5, T2 doesn't wait for the initLock. This is because without the (original) patch, self._request_middleware is not None (django/core/handlers/wsgi.py line 226).

Indeed, my mistake, should have been more careful.

comment:17 Changed 5 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [10036]) Fixed #10470 -- Fixed a race condition in middleware initialization.

Thanks to Travis Terry and mrts.

Changed 5 years ago by mrts

Omitted append to right list

comment:18 Changed 5 years ago by mrts

  • Resolution fixed deleted
  • Status changed from closed to reopened

Malcolm, it shows that it is quite early in your timezone :) -- there's an accidental omission in r10036, please see the attached patch.

comment:19 Changed 5 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [10038]) [1.0.X] Fixed #10470 -- Fixed a race condition in middleware initialization.

Thanks to Travis Terry and mrts.

Backport of r10036 from trunk.

comment:20 Changed 5 years ago by mrts

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:21 Changed 5 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [10041]) [1.0.X] Fixed a problem from r10038. Fixed #10470.

Backport of r10040 from trunk.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.