Code

Opened 7 years ago

Closed 5 years ago

Last modified 3 years ago

#5133 closed Uncategorized (fixed)

Memcached connections get left open (in certain circumstances).

Reported by: jacob Owned by: nobody
Component: Core (Cache system) Version: master
Severity: Normal Keywords: memcached
Cc: mmalone, rajesh.dhawan@…, harm.verhagen+django@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX:

Description

In a few cases Django under mod_python can leave dangling connections to memcached open. We've had trouble tracking down the circumstances under which this happens, but when it does it can lead to memcached servers hitting their connection limits, which means caching stops. Nasty.

The attached patch should take care of this, but I'm not 100% sure it's the right approach.

Attachments (1)

memcached-cleanup-connections.patch (1.3 KB) - added by jacob 7 years ago.

Download all attachments as: .zip

Change History (20)

Changed 7 years ago by jacob

comment:1 Changed 7 years ago by Simon G. <dev@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 6 years ago by h.rauch@…

I have/had the following problem:

With memcached enabled, our site (dms.bildung.hessen.de) does not respond anymore after 10 - 20 minutes. Restarting memcached solves this problem for the next minutes. Therefor it's not a problem with apache or the program logic.

Therefore I tried the patch above. But it not solve our problem.

In the end I use the file-cache with no problems.

comment:3 Changed 6 years ago by hans.rauch@…

I'm sorry, my real email-address is not h.rauch@…, it's hans.rauch@…

comment:4 Changed 6 years ago by mcroydon

  • Resolution set to invalid
  • Status changed from new to closed

Please try the latest version of python-memcached (which is 1.4.0 as of this writing). I believe that the error was there and that it has been resolved between 1.3.9 and 1.4.0.

comment:5 Changed 6 years ago by hans.rauch@…

  • Resolution invalid deleted
  • Status changed from closed to reopened

I'm sorry, but I've been using the latest version of memcached (1.2.2) and python-memcached (1.4.0), when I watch these problems.

comment:6 Changed 6 years ago by mcroydon

hans: Indeed, my apologies. I looked at 1.4.0 during the sprint and must have missed it.

here's a patch against python-memcached 1.4.0 that removes threadlocals and doesn't leave any dangling connections for us. The threadlocals bit appeared somewhere in 1.3.x and has been the source for problems for us too. You might also look at cmemcache. This ticket should definitely remain open until we figure something out.

comment:7 Changed 6 years ago by hans.rauch@…

Thanks, your patch seems to solve our problem. Our Site is running for 24 hours and we didn't had to restart memcached or apache.

comment:8 follow-up: Changed 6 years ago by toxik

Improved patch (really just nitpick): http://dpaste.com/hold/41946/

comment:9 in reply to: ↑ 8 Changed 6 years ago by johann@…

Removing thread locals gives me lots of bad errors after running okay for a few minutes:

Traceback (most recent call last):
 File "/usr/lib/python2.5/site-packages/django/core/handlers/base.py", line 82, in get_response
   response = callback(request, *callback_args, **callback_kwargs)
 File "/usr/lib/python2.5/site-packages/shotserver04/websites/views.py", line 93, in details
   preload_foreign_keys(browsers, browser_group=True)
 File "/usr/lib/python2.5/site-packages/shotserver04/common/object_cache.py", line 165, in preload_foreign_keys
   [getattr(instance, field_id) for instance in instances]))
 File "/usr/lib/python2.5/site-packages/shotserver04/common/object_cache.py", line 92, in get_many
   [KEY_FORMAT % (model._meta.db_table, 'id', id) for id in id_list])
 File "/usr/lib/python2.5/site-packages/django/core/cache/backends/memcached.py", line 41, in get_many
   return self._cache.get_multi(map(smart_str,keys))
 File "/usr/lib/python2.5/site-packages/python_memcached-1.41-py2.5.egg/memcache.py", line 717, in get_multi
   retvals[prefixed_to_orig_key[rkey]] = val   # un-prefix returned key.
KeyError: 'platforms_operatingsystem:id=6'

However, memcached-cleanup-connections.patch seems to fix the open connections problem on my system.

  • Django trunk r7519
  • memcached 1.1.12-1
  • python-memcached-1.41

comment:10 Changed 6 years ago by mtredinnick

I just saw this happen on a client's production site after we updated the memcache client from 1.34 to 1.43 (server is 1.1.12). We ended up applying Jacob's original patch because it looked the least intrusive, but it feels like something that python-memcache should be fixing rather than Django having to do manual connection management (my only concern with Jacob's patch). We elected not to go with Matt's patch because it basically undoes something that was included in python-memcache 1.36, which was included to fix some (other) multi-threading issues, according to their ChangeLog.

I've filed a bug upstream with the python memcache providers, so we'll see if anything happens.

comment:11 Changed 6 years ago by mmalone

  • Cc mmalone added

comment:12 Changed 6 years ago by mtredinnick

  • milestone set to 1.0

Just heard back from the python-memcached maintainer and current behaviour is intended, to allow for more efficient pooling. This problem may well end up being because we're not getting the same thread-id each time inside the same mod_python process or something similar. Since we probably don't want to do inter-thread pooling (for the same reason we don't do database connection pooling), I'd suggest we go with Jacob's original patch for now as a way to manage the connection lifetimes in a well-defined fashion.

Not going to apply this immediately, in case anybody else on the list has bright ideas, but barring any good objections, I think this would be worth applying before 1.0. It's a "bug" in the sense that we're not cooperating well with memcache and problems that only appear when you're wildly successful and/or under high load are not a lot of fun to see.

comment:13 Changed 6 years ago by grahamd

If python-memcached can't handle being used in a multithreaded environment, which I recollect as the case when I looked at the code many moons ago, it isn't just a mod_python issue, but would also be an issue were mod_wsgi, fastcgi, paste server, cherrypy wsgi server were used in a multithreaded configuration. I suspect that python-memcached may also have a lot of issues if any of those hosting mechanisms were to kill off threads when they feel like they are no longer needed and then create new ones later on. This would possibly be an issue as not sure it would do cleanup of state when a thread no longer becomes used. Anyway, that later point is an issue for python-memcached and not Django unless Django had similar issues with retaining state for threads beyond their lifetime.

comment:14 Changed 6 years ago by rajeshdhawan

  • Cc rajesh.dhawan@… added

comment:15 Changed 6 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [8418]) Fixed #5133 -- Explicitly close memcached connections after each request
(similar to database connection management). We can't effectively manage the
lifecycle by pooling connections and recent versions of python-memcache can
lead to connection exhaustion in some quite reasonable setups.

comment:16 Changed 5 years ago by booink@…

  • Resolution fixed deleted
  • Status changed from closed to reopened

The patch takes care of connections kept open, but it introduces another problem - the need to open one or more tcp connections every single request.

With a simple loop, you can make a system run out of sockets easily - after a socket is closed, that port cannot be reused for an eternity, ranging from 1 minute to 4 depending on OS. If enough sockets get stuck in TIME_WAIT state, the server simply fails to connect to memcached and start serving everything from db again - that's not something you want to see on a site with sufficient traffic to need a memcached installation.

In my opinion, the cure is worse than the disease. There's an easy workaround available for the original problem: restart workers after a certain amount of requests. With max-request=500 on a 5 threads deamon process (mod_wsgi, times 20 processes), we never go over 100 connections on our memcached server, started with the default cap of 1024 connections. If you run mod_python, use MaxRequestsPerChild.

My current solution is to just noop the whole fix with one line in any .py: django.core.cache.backends.memcached.CacheClass.close = lambda x: None. It might be an idea to make it configurable so people can choose between disconnect after every request and keep it open until process restart.

comment:17 Changed 5 years ago by Alex

  • Resolution set to fixed
  • Status changed from reopened to closed

This bug was fixed many months ago, please file a newbug if you believe there is a new issue.

comment:18 Changed 3 years ago by harm

  • Cc harm.verhagen+django@… added
  • Easy pickings unset
  • Severity set to Normal
  • Type set to Uncategorized

comment:19 Changed 3 years ago by jacob

  • milestone 1.0 deleted

Milestone 1.0 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.