Opened 12 years ago
Closed 10 years ago
#14093 closed Bug (fixed)
Confusing error when failing to create a session key because the cache is unavailable
Reported by: | Owned by: | Aymeric Augustin | |
---|---|---|---|
Component: | contrib.sessions | Version: | 1.2 |
Severity: | Normal | Keywords: | session, cache, session key |
Cc: | Alan Justino da Silva | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
Hi,
We are using Django 1.2.1, cache session backend and from time to time, on higher traffic (but not so much, ~600 online users in 5 minutes) a few requests ends with exception:
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/django/core/handlers/base.py", line 80, in get_response
response = middleware_method(request)
File "/home/wonderlife/app/request.py", line 78, in process_request
if request.user.is_authenticated():
File "/usr/lib64/python2.6/site-packages/django/contrib/auth/middleware.py", line 9, in get
request._cached_user = get_user(request)
File "/usr/lib64/python2.6/site-packages/django/contrib/auth/init.py", line 97, in get_user
user_id = request.session[SESSION_KEY]
File "/usr/lib64/python2.6/site-packages/django/contrib/sessions/backends/base.py", line 46, in getitem
return self._session[key]
File "/usr/lib64/python2.6/site-packages/django/contrib/sessions/backends/base.py", line 172, in _get_session
self._session_cache = self.load()
File "/usr/lib64/python2.6/site-packages/django/contrib/sessions/backends/cache.py", line 16, in load
self.create()
File "/usr/lib64/python2.6/site-packages/django/contrib/sessions/backends/cache.py", line 33, in create
raise RuntimeError("Unable to create a new session key.")
RuntimeError: Unable to create a new session key.
Attachments (2)
Change History (23)
comment:1 Changed 12 years ago by
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 Changed 11 years ago by
comment:3 Changed 11 years ago by
Has patch: | set |
---|
Replacing md5 session key with uuid session key would result in virtually zero chance of key collision.
See attached patch.
comment:4 Changed 11 years ago by
Patch needs improvement: | set |
---|
Tom, I applied your patch and ran the tests for a quick sanity check. Looks like one issue is that django.contrib.sessions.backends.file only allows the following characters in cache keys: VALID_KEY_CHARS = set("abcdef0123456789"). Not sure if we can modify that set or not.
comment:5 Changed 11 years ago by
Ah, I didn't run a test run, thanks for that. I don't see a problem with extending that set to include '-', or (less preferably) we could process the generated uuid to remove the '-', but that would leave it less recognizable as a uuid.
I'll update the patch (and re-run tests!) when I'm home from work.
Changed 11 years ago by
Attachment: | django-uuid-session-keys.diff added |
---|
Patch to replace md5 session key generation with uuid session key generation. Includes verbatim copy of uuid.py from python 2.6 to fix issue that uuid is not in std libs in python 2.4
comment:6 Changed 11 years ago by
Patch needs improvement: | unset |
---|
Updated patch does pass tests. I've had a good read of the docs, and cannot find anywhere where the format of the session key is discussed, so I don't think there are any doc patches required.
comment:7 Changed 11 years ago by
The VALID_KEY_CHARS setting was introduced to sanitize user input for security.
Your proposal to switch from MD5 to UUID4 isn't addressing the core of the problem. You'd be going from a 128 bit random(ish) number to a 113 bit random number. This really isn't going to fix the problem, which is that your cache is timing out. We need to address that directly.
See also my post on django-dev:
http://groups.google.com/group/django-developers/msg/61b51319646eb5ca
comment:8 Changed 11 years ago by
There is no problem with changing VALID_KEY_CHARS to include '-', the purpose of it is to sanitize user input to prevent directory traversals.
Other comments replied to on thread.
comment:9 Changed 11 years ago by
Easy pickings: | unset |
---|---|
Patch needs improvement: | set |
Severity: | → Normal |
Type: | → Uncategorized |
django-uuid-session-keys.diff fails to apply cleanly on to trunk
comment:10 Changed 11 years ago by
Type: | Uncategorized → Bug |
---|
comment:11 Changed 11 years ago by
UI/UX: | unset |
---|
I'm experiencing the "Unable to create a new session key" RuntimeError with a client's application. Since it's a new application with moderate use, I do not believe that it is load-related. We have applied the uuid4() patch mentioned here to no avail.
While using tcpdump to diagnose potential problems communicating with memcached, I observed the following flow:
app server -> memcached: GET <key from cookie> memcached -> app server: END (indicates nothing stored for this key) app server -> memcached: SET <key from cookie> ... (the packet capture is truncated here so I don't know what value is being set)
What's interesting is that this flow seems impossible from reading the code. What I expect to happen (and have confirmed by manual testing) is that if the GET returns nothing, we then see another GET for a new key, followed by an ADD for that new key. That flow matches this section of code (from django.contrib.sessions.backends.cache.SessionStore):
def create(self): for i in xrange(10000): self.session_key = self._get_new_session_key() # GET try: self.save(must_create=True) # ADD
I'm at a loss for how to explain the memcached communication I see given the RuntimeError traceback reported when this occurs on the live site. Nowhere in the code do I see where a SET should occur after a missed GET, or why this should result in a RuntimeError being raised. Any ideas?
comment:12 Changed 11 years ago by
Cc: | Clay McClure added |
---|---|
Has patch: | unset |
Patch needs improvement: | unset |
Unsetting 'Has patch' flag since the patch does not address the cause of the error.
comment:13 Changed 11 years ago by
What version of memcached are you using? I saw this problem on memcached 1.2.2. Since upgrading to 1.4.5 (which runs multithreaded by default I believe) I haven't seen the issue.
comment:14 Changed 11 years ago by
We're running 1.4.2.
I've looked at some more tcpdump traces, and have observed this exception occurring as a result of packet loss within Amazon EC2's network. If the SYN packet from app server to memcache server is lost, python-memcache times out before the kernel can retransmit it. All subsequent cache queries fail as python-memcache has marked the host dead, leading to the RuntimeError after 10,000 attempts to set a new session key.
I'm at a loss for how to fix this. I'd like to tune Linux's TCP retransmit timer lower than three seconds, but it's not tunable. Failing that, I'd like to have python-memcache retry a connection a couple times before giving up, but it won't. I see that pylibmc has configurable connect and retry behavior, but I haven't been able to find a combination of options that works around the packet loss.
comment:15 Changed 11 years ago by
Cc: | Alan Justino da Silva added |
---|
comment:16 Changed 10 years ago by
Easy pickings: | set |
---|---|
Summary: | Unable to create a new session key on higher traffic → Confusing error when failing to create a session key because the cache is unavailable |
I'm updating the the summary to reflect the fact that this problem isn't related to load.
Django can't do much if the cache doesn't work, or if the network loses enough packets to make the cache unusable.
On the other hand, Django should provide a more informative error message when this situation occurs. This is easy.
comment:17 Changed 10 years ago by
Cc: | Clay McClure removed |
---|
comment:18 Changed 10 years ago by
@aaugustin what needs to be done here? Just improve the error message?
Changed 10 years ago by
Attachment: | trac#14093-cache-session.diff added |
---|
Extend exception to explain that cache may be unavailable.
comment:19 Changed 10 years ago by
Has patch: | set |
---|---|
Owner: | changed from nobody to stumbles |
Status: | new → assigned |
I've updated the exception message in the attached patch.
comment:20 Changed 10 years ago by
Owner: | changed from stumbles to Aymeric Augustin |
---|---|
Status: | assigned → new |
comment:21 Changed 10 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
This seems to be related to #8311, which was fixed in [8620]
Anyway, that magic 10000 sounds like a code smell at least, probably the value should be adjustable for different server loads?
Flagging as accepted, at least a comment explaining the consecuences of that 10000 (what makes it different from 1000 or 100000?) should be added