Opened 14 years ago
Closed 12 years ago
#14093 closed Bug (fixed)
Confusing error when failing to create a session key because the cache is unavailable
Reported by: | Owned by: | Aymeric Augustin | |
---|---|---|---|
Component: | contrib.sessions | Version: | 1.2 |
Severity: | Normal | Keywords: | session, cache, session key |
Cc: | Alan Justino da Silva | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
Hi,
We are using Django 1.2.1, cache session backend and from time to time, on higher traffic (but not so much, ~600 online users in 5 minutes) a few requests ends with exception:
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/django/core/handlers/base.py", line 80, in get_response
response = middleware_method(request)
File "/home/wonderlife/app/request.py", line 78, in process_request
if request.user.is_authenticated():
File "/usr/lib64/python2.6/site-packages/django/contrib/auth/middleware.py", line 9, in get
request._cached_user = get_user(request)
File "/usr/lib64/python2.6/site-packages/django/contrib/auth/init.py", line 97, in get_user
user_id = request.session[SESSION_KEY]
File "/usr/lib64/python2.6/site-packages/django/contrib/sessions/backends/base.py", line 46, in getitem
return self._session[key]
File "/usr/lib64/python2.6/site-packages/django/contrib/sessions/backends/base.py", line 172, in _get_session
self._session_cache = self.load()
File "/usr/lib64/python2.6/site-packages/django/contrib/sessions/backends/cache.py", line 16, in load
self.create()
File "/usr/lib64/python2.6/site-packages/django/contrib/sessions/backends/cache.py", line 33, in create
raise RuntimeError("Unable to create a new session key.")
RuntimeError: Unable to create a new session key.
Attachments (2)
Change History (23)
comment:1 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 14 years ago
comment:3 by , 14 years ago
Has patch: | set |
---|
Replacing md5 session key with uuid session key would result in virtually zero chance of key collision.
See attached patch.
comment:4 by , 14 years ago
Patch needs improvement: | set |
---|
Tom, I applied your patch and ran the tests for a quick sanity check. Looks like one issue is that django.contrib.sessions.backends.file only allows the following characters in cache keys: VALID_KEY_CHARS = set("abcdef0123456789"). Not sure if we can modify that set or not.
comment:5 by , 14 years ago
Ah, I didn't run a test run, thanks for that. I don't see a problem with extending that set to include '-', or (less preferably) we could process the generated uuid to remove the '-', but that would leave it less recognizable as a uuid.
I'll update the patch (and re-run tests!) when I'm home from work.
by , 14 years ago
Attachment: | django-uuid-session-keys.diff added |
---|
Patch to replace md5 session key generation with uuid session key generation. Includes verbatim copy of uuid.py from python 2.6 to fix issue that uuid is not in std libs in python 2.4
comment:6 by , 14 years ago
Patch needs improvement: | unset |
---|
Updated patch does pass tests. I've had a good read of the docs, and cannot find anywhere where the format of the session key is discussed, so I don't think there are any doc patches required.
comment:7 by , 14 years ago
The VALID_KEY_CHARS setting was introduced to sanitize user input for security.
Your proposal to switch from MD5 to UUID4 isn't addressing the core of the problem. You'd be going from a 128 bit random(ish) number to a 113 bit random number. This really isn't going to fix the problem, which is that your cache is timing out. We need to address that directly.
See also my post on django-dev:
http://groups.google.com/group/django-developers/msg/61b51319646eb5ca
comment:8 by , 14 years ago
There is no problem with changing VALID_KEY_CHARS to include '-', the purpose of it is to sanitize user input to prevent directory traversals.
Other comments replied to on thread.
comment:9 by , 14 years ago
Easy pickings: | unset |
---|---|
Patch needs improvement: | set |
Severity: | → Normal |
Type: | → Uncategorized |
django-uuid-session-keys.diff fails to apply cleanly on to trunk
comment:10 by , 14 years ago
Type: | Uncategorized → Bug |
---|
comment:11 by , 14 years ago
UI/UX: | unset |
---|
I'm experiencing the "Unable to create a new session key" RuntimeError with a client's application. Since it's a new application with moderate use, I do not believe that it is load-related. We have applied the uuid4() patch mentioned here to no avail.
While using tcpdump to diagnose potential problems communicating with memcached, I observed the following flow:
app server -> memcached: GET <key from cookie> memcached -> app server: END (indicates nothing stored for this key) app server -> memcached: SET <key from cookie> ... (the packet capture is truncated here so I don't know what value is being set)
What's interesting is that this flow seems impossible from reading the code. What I expect to happen (and have confirmed by manual testing) is that if the GET returns nothing, we then see another GET for a new key, followed by an ADD for that new key. That flow matches this section of code (from django.contrib.sessions.backends.cache.SessionStore):
def create(self): for i in xrange(10000): self.session_key = self._get_new_session_key() # GET try: self.save(must_create=True) # ADD
I'm at a loss for how to explain the memcached communication I see given the RuntimeError traceback reported when this occurs on the live site. Nowhere in the code do I see where a SET should occur after a missed GET, or why this should result in a RuntimeError being raised. Any ideas?
comment:12 by , 14 years ago
Cc: | added |
---|---|
Has patch: | unset |
Patch needs improvement: | unset |
Unsetting 'Has patch' flag since the patch does not address the cause of the error.
comment:13 by , 13 years ago
What version of memcached are you using? I saw this problem on memcached 1.2.2. Since upgrading to 1.4.5 (which runs multithreaded by default I believe) I haven't seen the issue.
comment:14 by , 13 years ago
We're running 1.4.2.
I've looked at some more tcpdump traces, and have observed this exception occurring as a result of packet loss within Amazon EC2's network. If the SYN packet from app server to memcache server is lost, python-memcache times out before the kernel can retransmit it. All subsequent cache queries fail as python-memcache has marked the host dead, leading to the RuntimeError after 10,000 attempts to set a new session key.
I'm at a loss for how to fix this. I'd like to tune Linux's TCP retransmit timer lower than three seconds, but it's not tunable. Failing that, I'd like to have python-memcache retry a connection a couple times before giving up, but it won't. I see that pylibmc has configurable connect and retry behavior, but I haven't been able to find a combination of options that works around the packet loss.
comment:15 by , 13 years ago
Cc: | added |
---|
comment:16 by , 13 years ago
Easy pickings: | set |
---|---|
Summary: | Unable to create a new session key on higher traffic → Confusing error when failing to create a session key because the cache is unavailable |
I'm updating the the summary to reflect the fact that this problem isn't related to load.
Django can't do much if the cache doesn't work, or if the network loses enough packets to make the cache unusable.
On the other hand, Django should provide a more informative error message when this situation occurs. This is easy.
comment:17 by , 13 years ago
Cc: | removed |
---|
by , 12 years ago
Attachment: | trac#14093-cache-session.diff added |
---|
Extend exception to explain that cache may be unavailable.
comment:19 by , 12 years ago
Has patch: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
I've updated the exception message in the attached patch.
comment:20 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:21 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
This seems to be related to #8311, which was fixed in [8620]
Anyway, that magic 10000 sounds like a code smell at least, probably the value should be adjustable for different server loads?
Flagging as accepted, at least a comment explaining the consecuences of that 10000 (what makes it different from 1000 or 100000?) should be added