Opened 3 years ago

Closed 2 years ago

#19117 closed Bug (wontfix)

Database and memcached connections break after fork.

Reported by: sebastian_noack Owned by: nobody
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords:
Cc: davidswafford Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no


If you have a management command that does CPU-heavy tasks, or when implementing a server for certain background tasks, its likely that you will use the multiprocessing module, to scale over multiple CPUs. However django implements connections to the database and memcached as singletons (created on first use, reused forever). So if you have used the database or memcached before forking, the child processes inherit the established connection. And when multiple processes use a connection at the same time (which can and will happen) the requests will fail in an ugly way.

However the multiprocessing module comes with a mechanism provided for such cases, that enables you to cleanup things after fork. My patch uses that mechanism, in order to reset the possibly created database and memcached connections after fork. So that the child process will create its own connection when it needs it.

Attachments (1)

0001-Re-connect-database-and-memcached-after-fork.patch (2.1 KB) - added by sebastian_noack 3 years ago.

Download all attachments as: .zip

Change History (8)

Changed 3 years ago by sebastian_noack

comment:1 Changed 3 years ago by lrekucki

  • Needs documentation unset
  • Needs tests set
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

I'm pretty sure there are other things that might break. That's why the common solution for background tasks is to use task/message queues (like celery just to name one). Application servers like gunicorn or mod_wsgi have no trouble spawning multiple Django workers. I don't see any advantage in forking during processing of a request, so i'm not sure this is a use case we want to support.

comment:2 Changed 3 years ago by sebastian_noack

Of course mod_wsgi don't have any problems with that, as it forks the process before starting the Python interpreter and importing django. And I didn't talked about forking while processing a request. I talked about a management command or daemon running in the background. In my specific case it's the part of our application stack, that dispatches the newsletter. So I have a management command, that forks multiple worker processes to render the emails and send them via SMTP. For every management command like that, which runs for more than a few minutes, delegating tasks to child processes makes absolutely sense.

comment:3 Changed 3 years ago by aaugustin

A quick workaround is to close the database and cache connection before forking; they'll be automatically reopened on the first subsequent access.

I'm not eager to add this code, because it's non-trivial, impossible to test, and rarely useful...

comment:4 Changed 3 years ago by apollo13

  • Resolution set to wontfix
  • Status changed from new to closed

I agree with Aymeric.

comment:5 Changed 2 years ago by davidswafford

Hey Sebastion,

I'm recently hitting this issue as well. I'm building a scheduling system that kicks of background jobs that will be long-running. What's the recommended way to clear the DB session when forking? I'm using this with mixed results:

from django.db import transaction

def clear_dbsession(*kargs, kwargs):

""" force Django to clear the existing DB session """

comment:6 Changed 2 years ago by davidswafford

  • Cc davidswafford added
  • Resolution wontfix deleted
  • Status changed from closed to new

comment:7 Changed 2 years ago by akaariai

  • Resolution set to wontfix
  • Status changed from new to closed

The problem is that many libraries used by Django do not support forking. For example you can't expect to use plain psycopg2 connection after fork.

The thing is, Django isn't designed to be used with fork(). It might work if you close all memcached and database connections before the fork. But then again it might not. Guaranteeing that everything will just work when using fork() will be nearly impossible.

You can discuss this design decision on DevelopersMailingList. Unfortunately the reason for the wontfix is that we can't make this work instead of we don't want to make this work, so getting this accepted will likely be hard.

You might want to explore other solutions, for example using subprocess module and explicitly communicating the initial state between processes, or using some message queue solution.

Note: See TracTickets for help on using tickets.
Back to Top