Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#463 closed defect (duplicate)

[patch] new mysql.DatabaseWrapper - solving multiple MySQL problems

Reported by: eugene@… Owned by: adrian
Component: Core (Other) Version:
Severity: major Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

This patch solves a problem of multiple errors while hosting Django app with FastCGI and MySQL. I was running it for 24 hours without any problems.

Some highlights:

1) Django uses the same connection for all threads. It breaks MySQL leading to numerous random CR_SERVER_GONE_ERROR and CR_SERVER_LOST errors. Every independently talking entity should have its own connection. I've implemented mysql.DatabaseWrapper using dictionary to keep one connection per thread.

2) During request for new connection, old connections are examined. If thread is gone, its connection is closed and garbage-collected.

3) MySQL connection can time-out depending on MySQL server settings. The standard practice is to ping() stored connections before use. My implementation does it for every retrieved connection.

Some potential problems:

1) I rename threads, which request connections, to make them unique. If some other code relies on thread names, it may be broken.

2) 24 hour testing is not enough for production quality code. Changes are very small and compact. They can be verified manually. But still it is not a full blown system test under different hosting scenarios.

3) Please take a look at the code and verify that it is optimal --- my python experience is limited, I could implement something in sub-optimal way.

The patch:

Index: mysql.py
===================================================================
--- mysql.py	(revision 629)
+++ mysql.py	(working copy)
@@ -11,6 +11,9 @@
 from MySQLdb.constants import FIELD_TYPE
 import types
 
+import thread, threading
+from sets import Set
+
 DatabaseError = Database.DatabaseError
 
 django_conversions = conversions.copy()
@@ -23,32 +26,78 @@
 
 class DatabaseWrapper:
     def __init__(self):
-        self.connection = None
+        self.connections = {}
+        self.threads = Set()
+        self.lock = thread.allocate_lock()
         self.queries = []
+        
+    def _get_connection(self):
+        self.lock.acquire()
+        try:
+            # find existing connection
+            id = threading.currentThread().getName()
+            if id in self.connections:
+                connection = self.connections[id]
+                connection.ping()
+                return connection
+            # normalize thread name
+            if id != 'MainThread':
+                id = str(thread.get_ident())
+                threading.currentThread().setName(id)
+            # remove deadwood
+            dead = self.threads - Set([x.getName() for x in threading.enumerate()])
+            for name in dead:
+                self.connections[name].close()
+                del self.connections[name]
+            self.threads -= dead
+            # create new connection
+            from django.conf.settings import DATABASE_USER, DATABASE_NAME, DATABASE_HOST, DATABASE_PASSWORD
+            connection = Database.connect(user=DATABASE_USER, db=DATABASE_NAME,
+                    passwd=DATABASE_PASSWORD, host=DATABASE_HOST, conv=django_conversions)
+            self.connections[id] = connection
+            self.threads.add(id)
+            return connection
+        finally:
+            self.lock.release()
 
     def cursor(self):
-        from django.conf.settings import DATABASE_USER, DATABASE_NAME, DATABASE_HOST, DATABASE_PASSWORD, DEBUG
-        if self.connection is None:
-            self.connection = Database.connect(user=DATABASE_USER, db=DATABASE_NAME,
-                passwd=DATABASE_PASSWORD, host=DATABASE_HOST, conv=django_conversions)
+        connection = self._get_connection()
+        from django.conf.settings import DEBUG
         if DEBUG:
-            return base.CursorDebugWrapper(self.connection.cursor(), self)
-        return self.connection.cursor()
+            return base.CursorDebugWrapper(connection.cursor(), self)
+        return connection.cursor()
 
     def commit(self):
-        self.connection.commit()
+        self.lock.acquire()
+        try:
+            id = threading.currentThread().getName()
+            if id in self.connections:
+                self.connections[id].commit()
+        finally:
+            self.lock.release()
 
     def rollback(self):
-        if self.connection:
-            try:
-                self.connection.rollback()
-            except Database.NotSupportedError:
-                pass
+        self.lock.acquire()
+        try:
+            id = threading.currentThread().getName()
+            if id in self.connections:
+                try:
+                    self.connections[id].rollback()
+                except Database.NotSupportedError:
+                    pass
+        finally:
+            self.lock.release()
 
     def close(self):
-        if self.connection is not None:
-            self.connection.close()
-            self.connection = None
+        self.lock.acquire()
+        try:
+            id = threading.currentThread().getName()
+            if id in self.connections:
+                connection = self.connections[id]
+                connection.close()
+                del self.connections[id]
+        finally:
+            self.lock.release()
 
 def get_last_insert_id(cursor, table_name, pk_name):
     cursor.execute("SELECT LAST_INSERT_ID()")

Attachments (8)

mysql.patch (3.8 KB) - added by eugene@… 10 years ago.
patch as a file
mysql.diff (4.0 KB) - added by Nebojša Đorđević - nesh <nesh@…> 10 years ago.
new patch
mysql_0.91.diff (4.3 KB) - added by edgars@… 10 years ago.
patch against release 0.91
mysql_rev2307.diff (4.1 KB) - added by edgars.jekabsons@… 10 years ago.
patch against trunk (rev. 2307)
mysql_trunk_rev2360.diff (2.4 KB) - added by greg@… 10 years ago.
Patch for trunk 2360, using thread-local storage
mysql_magic-removal_rev2360.diff (2.3 KB) - added by greg@… 10 years ago.
Patch for magic-removal 2360, using thread-local storage
mysql_trunk_rev2360-1.diff (3.7 KB) - added by greg@… 10 years ago.
Patch for trunk 2360, using thread-local storage and python 2.3 support
mysql_magic-removal_rev2360-1.diff (3.6 KB) - added by greg@… 10 years ago.
Patch for magic-removal 2360, using thread-local storage and python 2.3 support

Download all attachments as: .zip

Change History (33)

Changed 10 years ago by eugene@…

patch as a file

comment:1 Changed 10 years ago by garthk

I lack the time to check out the solution, but I'm all for fixing the problems described.

comment:2 Changed 10 years ago by anonymous

  • Owner changed from adrian to anonymous

comment:3 Changed 10 years ago by Boffbowsh

  • Owner changed from anonymous to adrian

comment:4 Changed 10 years ago by jacob

  • milestone set to Version 1.0

comment:5 Changed 10 years ago by eugene@…

I ran this patch for 3 weeks now. No problems so far.

comment:6 Changed 10 years ago by Maniac <Maniac@…>

For anyone interesting. There is a similar issue with Postgres for which I filed ticket 900.

comment:7 Changed 10 years ago by Nebojša Đorđević - nesh <nesh@…>

I'm rewrote your patch for latest [1432] revision. There are some differences:

  • connection.ping() is always checked before returning cursor, and if connection is invalid then connection is recreated
  • for locking I'm using django.utils.synch.RWLock
  • instead renaming threads (witch will interfere with i18n stuff) I'm using currentThread() object as a thread id (like Hugo in i18n)

Changed 10 years ago by Nebojša Đorđević - nesh <nesh@…>

new patch

comment:8 Changed 10 years ago by eugene@…

Checking connections while pinging -- good catch. Using RWLock --- good refresh, I forgot to update the patch with RWLock, when it became available.

Regarding currentThread() -- I had some reservations about it. I tried to experiment with currentThread() and:

  1. "If the caller's thread of control was not created through the threading module, a dummy thread object with limited functionality is returned." -- from http://www.python.org/doc/2.4.2/lib/module-threading.html. Web server can create its own threads (or thread pools, if we want to be realistic). WSGI can create threads without threading module, e.g., it can use thread module directly.
  2. currentThread() is not guaranteed to return unique objects, e.g., they can be repeatedly reused, when threads are recycled.

Basically I was concerned about potential clash because uniqueness is not guaranteed. Maybe it is unfounded. E.g., I cannot remember why (2) was important. It doesn't look dangerous now.

(1) depends on implementation. If we assume that all Python implementations return unique dummy thread objects, it's probably OK. Theoretically it can return the same static object.

OTOH "thread renaming" may suffer from similar problems depending on implementation. I want a sage advice on that. :-)

comment:9 Changed 10 years ago by Nebojša Đorđević - nesh <nesh@…>

Is this per thread stuff is really required? As far as I can see real problem is in not checking for connection validity and not threading stuff - in my test setup (FastCGI & mod_fcgi) all problems are gone when I only check with connection.ping and reconnect if connection is invalid.

Maybe someone more knowledgeable in django database internals may shed some light on this.

Second solution is using some type of database connection pool instead thread local connections.

comment:10 Changed 10 years ago by hugo

Actually currentThread() will return unique objects - they can't be reused, the thread names might be reused, but the object ID won't. And that's what is used when you use objects as dictionary keys. The dummy object created might have limited functionality - but we don't use the functionality, we only use the identity, so that's enough.

comment:11 Changed 10 years ago by eugene@…

Originally I had problems with multithreaded FCGI under load. It was not related to pings --- MySQL got confused when SQL statements from different threads came using one connection. I implemented "per thread stuff" after advices of MySQL experts. I added pings to prevent time-outs, which happened after some idle time.

INHO, database connection pool is a good idea.

comment:12 Changed 10 years ago by eugene@…

It's good to know that currentThread() is unique. In my original implementation I used get_ident() as a source of certifiably unique names with silly renaming of threads. It is the only thread-related method, which is explicitly documented as returning unique values, which are suitable as dictionary index (http://www.python.org/doc/2.4.2/lib/module-thread.html). No other thread/treading-related method is documented as such.

Unfortunately I didn't find TLS in python. That's what is used usually for stuff like that.

comment:13 Changed 10 years ago by Maniac <Maniac@…>

I'm about to port this new patch for Postgres and have a question. Why is "remove deadwood" part needed? As far as I understand connections are always deleted on every request calling db.close().

comment:14 Changed 10 years ago by eugene@…

"Remove deadwood" is just an extra security measure against potential sloppiness in closing connections and handling exceptions. After watching MySQL connections die randomly I decided to take no chances.

Changed 10 years ago by edgars@…

patch against release 0.91

comment:15 Changed 10 years ago by edgars@…

  • priority changed from high to highest
  • Severity changed from major to critical

Please include this patch ASAP. Otherwise Django is no use in production as fcgi using MySQL database.

Changed 10 years ago by edgars.jekabsons@…

patch against trunk (rev. 2307)

comment:16 Changed 10 years ago by Tom Tobin <korpios@…>

  • priority changed from highest to high
  • Severity changed from critical to major

Bumping priority and severity down one rank each; while this may be significant for this particular use-case (MySQL and FastCGI), that use-case has at least two obvious workarounds — switching to another database (preferably PostgreSQL), or switching to mod_python. I'm not saying it shouldn't get fixed; I'm just saying that marking it "highest" and "critical" seems nonsensical.

comment:17 Changed 10 years ago by Maniac <Maniac@…>

Tom, Postgres' backend in Django suffers from same problems :-). And it also have the patch waiting in #900

comment:18 Changed 10 years ago by Tom Tobin <korpios@…>

After reading through #900, the alternatives I'd suggest become "switch to mod_python" or "use the preforking mpm". Again — I never said it shouldn't be fixed, or that it wasn't a PITA if your situation falls under these cases . . . just that there are production-ready workarounds at the moment. :)

comment:19 Changed 10 years ago by Maniac <Maniac@…>

I doidn't mean to disagree with your resolution. I think it would be critical for 1.0 but not now.

But to clarify: all these bad things happen with mod_python too. Switching to preforking worker would help but many people don't control their server (shared hosting, evil admin, other projects requiring threads etc.)

Changed 10 years ago by greg@…

Patch for trunk 2360, using thread-local storage

Changed 10 years ago by greg@…

Patch for magic-removal 2360, using thread-local storage

Changed 10 years ago by greg@…

Patch for trunk 2360, using thread-local storage and python 2.3 support

Changed 10 years ago by greg@…

Patch for magic-removal 2360, using thread-local storage and python 2.3 support

comment:20 Changed 10 years ago by anonymous

If I may, let me suggest a slightly different implementation that I've had good luck with. Instead of using the current thread (or a value derived from the current thread) as a key into a dict, you can use Python's built-in thread-local storage mechanism. That way you don't need to worry about synchronization or removing "deadwood" (I like the name :-). There's one minor hitch, which is that threading.local() isn't available before Python 2.4. So I included an implementation of it for earlier interpreters (e.g. 2.3) that don't have this functionality built-in.

I've attached patches for the trunk and the magic-removal branch, as of revision 2360. The ones to look at are mysql_trunk_rev2360-1.diff and mysql_magic-removal_rev2360-1.diff. Please ignore mysql_trunk_rev2360.diff and mysql_magic-removal_rev2360.diff; Trac reported an internal error when I tried to replace the first versions that I uploaded with newer versions that support Python 2.3, so I had to change the names to get it to work. Sorry for the extra cruft.

comment:21 Changed 10 years ago by greg@…

Oops, I didn't mean to leave the last comment anonymous... that was me.

comment:22 Changed 10 years ago by greg@…

Oh... or, we could include a copy of _thread_local.py from python 2.4 like #1268 does (in current_user.diff).

comment:23 Changed 9 years ago by greg@…

Eugene's code in #1442 is more elegant... IMHO we should use his fix, and close this as a duplicate.

comment:24 Changed 9 years ago by eugene@…

  • Resolution set to duplicate
  • Status changed from new to closed

Superseded by #1442

comment:25 Changed 9 years ago by anonymous

  • milestone Version 1.0 deleted

Milestone Version 1.0 deleted

Note: See TracTickets for help on using tickets.
Back to Top