Opened 12 years ago

Closed 11 years ago

#19117 closed Bug (wontfix)

Database and memcached connections break after fork.

Reported by: Sebastian Noack Owned by: nobody
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords:
Cc: davidswafford Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If you have a management command that does CPU-heavy tasks, or when implementing a server for certain background tasks, its likely that you will use the multiprocessing module, to scale over multiple CPUs. However django implements connections to the database and memcached as singletons (created on first use, reused forever). So if you have used the database or memcached before forking, the child processes inherit the established connection. And when multiple processes use a connection at the same time (which can and will happen) the requests will fail in an ugly way.

However the multiprocessing module comes with a mechanism provided for such cases, that enables you to cleanup things after fork. My patch uses that mechanism, in order to reset the possibly created database and memcached connections after fork. So that the child process will create its own connection when it needs it.

Attachments (1)

0001-Re-connect-database-and-memcached-after-fork.patch (2.1 KB ) - added by Sebastian Noack 12 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 by Łukasz Rekucki, 12 years ago

Needs tests: set
Triage Stage: UnreviewedDesign decision needed

I'm pretty sure there are other things that might break. That's why the common solution for background tasks is to use task/message queues (like celery just to name one). Application servers like gunicorn or mod_wsgi have no trouble spawning multiple Django workers. I don't see any advantage in forking during processing of a request, so i'm not sure this is a use case we want to support.

comment:2 by Sebastian Noack, 12 years ago

Of course mod_wsgi don't have any problems with that, as it forks the process before starting the Python interpreter and importing django. And I didn't talked about forking while processing a request. I talked about a management command or daemon running in the background. In my specific case it's the part of our application stack, that dispatches the newsletter. So I have a management command, that forks multiple worker processes to render the emails and send them via SMTP. For every management command like that, which runs for more than a few minutes, delegating tasks to child processes makes absolutely sense.

comment:3 by Aymeric Augustin, 11 years ago

A quick workaround is to close the database and cache connection before forking; they'll be automatically reopened on the first subsequent access.

I'm not eager to add this code, because it's non-trivial, impossible to test, and rarely useful...

comment:4 by Florian Apolloner, 11 years ago

Resolution: wontfix
Status: newclosed

I agree with Aymeric.

comment:5 by davidswafford, 11 years ago

Hey Sebastion,

I'm recently hitting this issue as well. I'm building a scheduling system that kicks of background jobs that will be long-running. What's the recommended way to clear the DB session when forking? I'm using this with mixed results:

from django.db import transaction

@transaction.commit_manually
def clear_dbsession(*kargs, kwargs):

""" force Django to clear the existing DB session """
transaction.commit()

comment:6 by davidswafford, 11 years ago

Cc: davidswafford added
Resolution: wontfix
Status: closednew

comment:7 by Anssi Kääriäinen, 11 years ago

Resolution: wontfix
Status: newclosed

The problem is that many libraries used by Django do not support forking. For example you can't expect to use plain psycopg2 connection after fork.

The thing is, Django isn't designed to be used with fork(). It might work if you close all memcached and database connections before the fork. But then again it might not. Guaranteeing that everything will just work when using fork() will be nearly impossible.

You can discuss this design decision on DevelopersMailingList. Unfortunately the reason for the wontfix is that we can't make this work instead of we don't want to make this work, so getting this accepted will likely be hard.

You might want to explore other solutions, for example using subprocess module and explicitly communicating the initial state between processes, or using some message queue solution.

Note: See TracTickets for help on using tickets.
Back to Top