#35757 closed Bug (needsinfo)
memcached.PyMemcacheCache reentrancy problem with ASGI-based runserver
Reported by: | Harm Verhagen | Owned by: | |
---|---|---|---|
Component: | Core (Cache system) | Version: | 5.1 |
Severity: | Normal | Keywords: | asgi, async |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I found that using memcached.PyMemcacheCache
with runserver has reentrancy problems.
I implemented a json api in django, when issuing multiple request simultaneously from javascript, the requests sometimes crash.
I have urls that do something like this.
from django.core.cache import cache def example_url(request): cache.incr("some key", 1) # count requests return JsonResponse(data=data, status=201)
When issuing 5-10 requests simultaneously on a runserver, it often (not always) get this crash
File "..../views/api.py", line 2512, in example_url cache.incr("some key", 1) File "./venv/lib/python3.12/site-packages/django/core/cache/backends/memcached.py", line 110, in incr val = self._cache.incr(key, delta) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 350, in incr return self._run_cmd("incr", key, False, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 322, in _run_cmd return self._safely_run_func(client, func, default_val, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 211, in _safely_run_func result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.12/site-packages/pymemcache/client/base.py", line 827, in incr results = self._misc_cmd([cmd], b"incr", noreply) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.12/site-packages/pymemcache/client/base.py", line 1291, in _misc_cmd buf, line = _reader(self.sock, buf) ^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.12/site-packages/pymemcache/client/base.py", line 1658, in _readline buf = _recv(sock, RECV_SIZE) ^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.12/site-packages/pymemcache/client/base.py", line 1750, in _recv return sock.recv(size) OSError: [Errno 9] Bad file descriptor
The following code looked a bit suspicious.
cache.backends.memcached.py:99 BaseMemcachedCache
def close(self, **kwargs): # Many clients don't clean up connections properly. self._cache.disconnect_all()
Doesn't this race? closing a all connections, might close a connection of a parallel request that was just opened but not yet finished.
Notes
I'm not putting a heavy load on runserver, its a simple javascript app, that happens to do 10 requests on a single page. Django should be able to handle that, also in a dev environment
Workaround
Adding the --noasgi
option seems to help.
./manage.py runserver --noasgi
Weird enough the (to me) more obvious, --nothreading
option does _not_ help. That crashes in the same way.
NB: this workaround breaks django channels, so if you have websockets in your app, then this "workaround" is not usable.
Version info
Django==5.1.1 pymemcache==4.0.0 python 3.12.6 channels==3.0.5 <--- related? channels-redis==3.4.1
Change History (22)
comment:1 by , 7 weeks ago
Description: | modified (diff) |
---|
comment:2 by , 7 weeks ago
Description: | modified (diff) |
---|
comment:3 by , 7 weeks ago
Description: | modified (diff) |
---|
comment:4 by , 7 weeks ago
Summary: | memcached.PyMemcacheCache reentrancy problem (OSError: [Errno 9] Bad file descriptor) → memcached.PyMemcacheCache reentrancy problem with ASGI-based runserver |
---|
comment:5 by , 7 weeks ago
Description: | modified (diff) |
---|
comment:6 by , 7 weeks ago
Description: | modified (diff) |
---|
comment:7 by , 7 weeks ago
Description: | modified (diff) |
---|
comment:8 by , 7 weeks ago
comment:9 by , 7 weeks ago
Description: | modified (diff) |
---|
comment:10 by , 7 weeks ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
This report seems better suited to be a support request. The best place to get answers to your issue is using any of the user support channels from this link.
Trac is for issues that are confirmed to be in Django. If, after debugging, you find out that this is indeed a bug in Django, please re-open with the specific details and please be sure to include a small Django project to reproduce or a failing test case.
Thank you!
comment:12 by , 5 weeks ago
I reproduced this problem with a small Django project.
It seems to be related with channels/daphne
As there is a testcase, I re-opened the ticket for re-evaluation.
Please advise if more information is required.
I created a new project according the to polls example. The problem occurs when adding 'channels'/'daphne' to INSTALLED_APPS
(without even using channels).
polls.views.py
from django.http import HttpResponse from django.core.cache import cache def incr(): key = 'example:key' try: val = cache.incr(key, 1) except ValueError: cache.set(key, 1) val = 1 return val def index(request): v = incr() return HttpResponse("Hello, world. You're at the polls index. %d" % v)
settings.py
CACHES = { 'default': { 'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache', # 'BACKEND': 'django.core.cache.backends.locmem.LocMemCache', # using LocMemCache 'fixes' the problem 'LOCATION': "localhost:11211" }, } INSTALLED_APPS = [ 'daphne', # must be before staticfiles (removing this fixes the problem (with channels 4) 'channels', 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', ] ASGI_APPLICATION = "mysite.asgi.application" # required for channels
fire up memached & runserver and browse to http://localhost:8000/polls/, and see everything working (no exceptions)
so far, so good....
Steps to reproduce:
run the following script get_parallel.sh
#!/bin/bash # # do requests in parallel # https://code.djangoproject.com/ticket/35757 url='http://localhost:8000/polls/' curl $url& curl $url& curl $url& curl $url
Result
Exceptions as shown in original report. (OSError: [Errno 9] Bad file descriptor
)
After a few runs runserver stops serving requests completely, all requests, even sequential ones from a browser, get a 500 with this traceback.
Traceback (most recent call last): File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 518, in thread_handler raise exc_info[1] File "venv/lib/python3.12/site-packages/django/core/handlers/exception.py", line 42, in inner response = await get_response(request) File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 518, in thread_handler raise exc_info[1] File "/venv/lib/python3.12/site-packages/django/core/handlers/base.py", line 253, in _get_response_async response = await wrapped_callback( File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 468, in __call__ ret = await asyncio.shield(exec_coro) File ".../Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/venv/lib/python3.12/site-packages/asgiref/sync.py", line 522, in thread_handler return func(*args, **kwargs) File "mysite/polls/views.py", line 20, in index incr() File "mysite/polls/views.py", line 11, in incr val = cache.incr(key, 1) File "venv/lib/python3.12/site-packages/django/core/cache/backends/memcached.py", line 110, in incr val = self._cache.incr(key, delta) File "venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 350, in incr return self._run_cmd("incr", key, False, *args, **kwargs) File "venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 314, in _run_cmd client = self._get_client(key) File "venv/lib/python3.12/site-packages/pymemcache/client/hash.py", line 182, in _get_client raise MemcacheError("All servers seem to be down right now") pymemcache.exceptions.MemcacheError: All servers seem to be down right now
When this happens, only a restart the runserver makes the system serve requests again.
Version info
with channels 3 the problem appears (and disappears when removed) when adding 'channels' to INSTALLED_APPS
with channels 4 the problem appears when adding 'channels' & 'daphne' to INSTALLED_APPS, disappears when 'daphne' is commented out.
channels==4.1.0
daphne==4.1.2
Django==5.1.1
python 3.12.6
follow-up: 14 comment:13 by , 5 weeks ago
Resolution: | invalid |
---|---|
Status: | closed → new |
comment:14 by , 5 weeks ago
Component: | Uncategorized → Core (Cache system) |
---|---|
Keywords: | asgi async added |
Resolution: | → duplicate |
Status: | new → closed |
Type: | Uncategorized → Bug |
Replying to Harm Verhagen:
Hello Harm! Thank you for the extra details. After some investigation, I believe this is a duplicate of #33625, so I'll be closing as such.
comment:15 by , 4 weeks ago
I don't think its a duplicate of #33625 (although it seems related)
I reopened for reconsideration.
Why this is different?
#33625 is an optimization problem. All works fine, but could be optimized for performance. The connection is opened/closed which adds (a bit of) latency to each request. (which might be acceptable, or not depending on your usecase)
This problem (#35757) however is about that the system actually breaks. When running the test above. runserver stops working completely, until restarted.
If you are developing a javascript app, that issues multiple requests simultaniously, then runserver (in this configuration) is unusabe.
Another reason why its no duplicate. I can reproduce the problem also with django.core.cache.backends.memcached.PyLibMCCache
instead of '....memcached.PyMemcacheCache
.
PyLibMCCache already does provide connection pooling, so the optimization requested in #33625 is not applicable for PyLibMCCache.
Reproducibility
Where you able to reproduce the problem with the example attached? (the problem that runserver actually gives 500's and eventually stops working completely)
Results with PyLibMCCache
NB: this might be a different problem, triggered by the same testcase
Result (PyLibMCCache)
When running requests in parallel, some of the requests get a 500.
Internal Server Error: /polls/ Traceback (most recent call last): File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 518, in thread_handler raise exc_info[1] File "venv/lib/python3.12/site-packages/django/core/handlers/exception.py", line 42, in inner response = await get_response(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 518, in thread_handler raise exc_info[1] File "venv/lib/python3.12/site-packages/django/core/handlers/base.py", line 253, in _get_response_async response = await wrapped_callback( ^^^^^^^^^^^^^^^^^^^^^^^ File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 468, in __call__ ret = await asyncio.shield(exec_coro) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "venv/lib/python3.12/site-packages/asgiref/sync.py", line 522, in thread_handler return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "django-35757/mysite/polls/views.py", line 22, in index v = incr() ^^^^^^ File "/tmp/dbg/django-35757/mysite/polls/views.py", line 11, in incr val = cache.incr(key, 1) ^^^^^^^^^^^^^^^^^^ File "/Users/harmverhagen/Documents/symeon/intemo/sensoren/cityznz/cityzenz/api/venv/lib/python3.12/site-packages/django/core/cache/backends/memcached.py", line 110, in incr val = self._cache.incr(key, delta) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pylibmc.Error: 1 keys failed HTTP GET /polls/ 500 [5.13, 127.0.0.1:63568]
NB: Sometimes I even noticed a python crash ! (with PyLibMCCache
only)
HTTP GET /polls/ 200 [0.11, 127.0.0.1:63819] Python(51613,0x17d037000) malloc: double free for ptr 0x13980a400 Python(51613,0x17d037000) malloc: *** set a breakpoint in malloc_error_break to debug
So, maybe the PyLibMCCache
issue is something different. (broken pylibmc
install on my mac, or something). If so ignore the PyLibMCCache
part of this comment, and lets just focus on PyMemcacheCache
, if that reproduces for you.
follow-up: 17 comment:16 by , 4 weeks ago
Resolution: | duplicate |
---|---|
Status: | closed → new |
comment:17 by , 4 weeks ago
Replying to Harm Verhagen:
Hello Harm, thanks for the extra details. I haven't been able to reproduce yet. While I re-try to do that, could you please confirm if this is reproducible with a production web server? (that means, without using the development server provided by runserver
).
Thank you!
comment:18 by , 4 weeks ago
I've not seen this issue on production, only runserver
.
We run production with
daphne <ourappname>.asgi:application -b 0.0.0.0 -p 8000
In a container, behind nginx. app configured to use PyMemcacheCache
comment:19 by , 4 weeks ago
Removing daphne
(or channels
in older versions) restores use of Django’s own runserver
command, so at that point you’re no longer running ASGI.
I would say the comment about the race condition looks most likely, but I’m confused by the last remark that it doesn’t occur in production. (Daphe’s runserver
command does very little except run daphne
)
My memcached chops are not what they could be. (I haven’t used it in production since before 2010 IIRC so… 🤷) Short of a clear diagnosis, it’s hard to know what to suggest.
comment:20 by , 4 weeks ago
Removing daphne (or channels in older versions) restores use of Django’s own runserver command, so at that point you’re no longer running ASGI.
ok, clear. that explains why that 'fixes' the problem.
comment:21 by , 3 weeks ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
I've not seen this issue on production, only runserver.
That still doesn’t make 100% sense to me.
Can I ask you to open an issue on the Daphne repo, providing a full reproduce project there please?
https://github.com/django/daphne
I’ll try and take a look, but short of something more concrete I can’t say that this is an issue in Django per se.
I’m going to mark this as needsinfo in the meantime.
Its not just a reentrancy problem. Wrapping the calls to
cache.incr()
in a global mutex (threading.Lock()
, just for testing purposes) does not seem to make any difference.