Opened 15 years ago

Closed 14 years ago

Last modified 13 years ago

#11012 closed (fixed)

Memcached cache module fails when retrieving binary strings via cache.get, during forcing it unicode

Reported by: erny Owned by: nobody
Component: Core (Cache system) Version: dev
Severity: Keywords:
Cc: josh@…, gjost@…, mcroydon@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

E.g.

from zlib import compress
cache_val = compress("sdf sdlf sdlfj sldkfj alsdkjf gallksr glasrljit rweioj tasdj gfapsdopjof ps")
cache.set('key', cache_val)
res = cache.get('key')

DjangoUnicodeDecodeError: 'utf8' codec can't decode byte ... in position ...: ...

In [5718] ticket #4845, the following is introduced:

...
--- memcached.py	(revisión: 5717)
+++ memcached.py	(revisión: 5718)
@@ -1,6 +1,7 @@
 "Memcached cache backend"
 
 from django.core.cache.backends.base import BaseCache, InvalidCacheBackendError
+from django.utils.encoding import smart_unicode, smart_str
 
 try:
     import cmemcache as memcache
@@ -16,17 +17,22 @@
         self._cache = memcache.Client(server.split(';'))
 
     def get(self, key, default=None):
-        val = self._cache.get(key)
+        val = self._cache.get(smart_str(key))
         if val is None:
             return default
         else:
-            return val
+            if isinstance(val, basestring):
+                return smart_unicode(val)
+            else:
+                return val
 
     def set(self, key, value, timeout=0):
-        self._cache.set(key, value, timeout or self.default_timeout)
+        if isinstance(value, unicode):
+            value = value.encode('utf-8')
+        self._cache.set(smart_str(key), value, timeout or self.default_timeout)
 
     def delete(self, key):
-        self._cache.delete(key)
+        self._cache.delete(smart_str(key))
 
     def get_many(self, keys):
-        return self._cache.get_multi(keys)
+        return self._cache.get_multi(map(smart_str,keys))

The problem is in:

-            return val
+            if isinstance(val, basestring):
+                return smart_unicode(val)
+            else:
+                return val

which makes it impossible to store binary strings. You have to encapsulate it in dummy objects, otherwise.

I still don't know why is it necessary to convert it to str (encode utf8) and back. Python memcached 1.43 pickles anything different to int, long, str, which should work fine with unicode.

Attachments (2)

memcached_revert_unicode.diff (769 bytes ) - added by erny 15 years ago.
don't convert from/to unicode during get/set in memcached
11012-memcached-binary-string.diff (1.7 KB ) - added by mcroydon 14 years ago.
Updated patch and added test to confirm original failure and fix.

Download all attachments as: .zip

Change History (12)

by erny, 15 years ago

don't convert from/to unicode during get/set in memcached

comment:1 by joshuajonah, 15 years ago

Cc: josh@… added

comment:2 by jefurii, 14 years ago

Cc: gjost@… added

comment:3 by Russell Keith-Magee, 14 years ago

Has patch: set
milestone: 1.2
Needs tests: set
Triage Stage: UnreviewedAccepted

comment:4 by Russell Keith-Magee, 14 years ago

#12178 was a dupe with a slightly different approach in the patch (manually specifying that the unicode conversion should be skipped).

comment:5 by mcroydon, 14 years ago

Owner: changed from nobody to mcroydon
Status: newassigned

I'm working on adding tests to this.

by mcroydon, 14 years ago

Updated patch and added test to confirm original failure and fix.

comment:6 by mcroydon, 14 years ago

Needs tests: unset
Owner: changed from mcroydon to nobody
Status: assignednew

I created a new diff that includes the patch and a test. The new test fails before the patch is applied and all cache tests pass after it's applied. Tested with memcached 1.4.4 and python-memcached 1.4.5.

comment:7 by mcroydon, 14 years ago

Tests pass for me against dummy, file, and locmem backends as well.

comment:8 by mcroydon, 14 years ago

Cc: mcroydon@… added

comment:9 by Jacob, 14 years ago

Resolution: fixed
Status: newclosed

(In [12637]) Fixed #11012: don't needless convert cache values to unicode.

This means you can now successfully store binary blogs, such as compressed data,
in the cache.

Thanks to Matt Croydon for the final patch.

comment:10 by Jacob, 13 years ago

milestone: 1.2

Milestone 1.2 deleted

Note: See TracTickets for help on using tickets.
Back to Top