Opened 6 years ago

Closed 5 years ago

Last modified 4 years ago

#11012 closed (fixed)

Memcached cache module fails when retrieving binary strings via cache.get, during forcing it unicode

Reported by: erny Owned by: nobody
Component: Core (Cache system) Version: master
Severity: Keywords:
Cc: josh@…, gjost@…, mcroydon@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

E.g.

from zlib import compress
cache_val = compress("sdf sdlf sdlfj sldkfj alsdkjf gallksr glasrljit rweioj tasdj gfapsdopjof ps")
cache.set('key', cache_val)
res = cache.get('key')

DjangoUnicodeDecodeError: 'utf8' codec can't decode byte ... in position ...: ...

In [5718] ticket #4845, the following is introduced:

...
--- memcached.py	(revisión: 5717)
+++ memcached.py	(revisión: 5718)
@@ -1,6 +1,7 @@
 "Memcached cache backend"
 
 from django.core.cache.backends.base import BaseCache, InvalidCacheBackendError
+from django.utils.encoding import smart_unicode, smart_str
 
 try:
     import cmemcache as memcache
@@ -16,17 +17,22 @@
         self._cache = memcache.Client(server.split(';'))
 
     def get(self, key, default=None):
-        val = self._cache.get(key)
+        val = self._cache.get(smart_str(key))
         if val is None:
             return default
         else:
-            return val
+            if isinstance(val, basestring):
+                return smart_unicode(val)
+            else:
+                return val
 
     def set(self, key, value, timeout=0):
-        self._cache.set(key, value, timeout or self.default_timeout)
+        if isinstance(value, unicode):
+            value = value.encode('utf-8')
+        self._cache.set(smart_str(key), value, timeout or self.default_timeout)
 
     def delete(self, key):
-        self._cache.delete(key)
+        self._cache.delete(smart_str(key))
 
     def get_many(self, keys):
-        return self._cache.get_multi(keys)
+        return self._cache.get_multi(map(smart_str,keys))

The problem is in:

-            return val
+            if isinstance(val, basestring):
+                return smart_unicode(val)
+            else:
+                return val

which makes it impossible to store binary strings. You have to encapsulate it in dummy objects, otherwise.

I still don't know why is it necessary to convert it to str (encode utf8) and back. Python memcached 1.43 pickles anything different to int, long, str, which should work fine with unicode.

Attachments (2)

memcached_revert_unicode.diff (769 bytes) - added by erny 6 years ago.
don't convert from/to unicode during get/set in memcached
11012-memcached-binary-string.diff (1.7 KB) - added by mcroydon 5 years ago.
Updated patch and added test to confirm original failure and fix.

Download all attachments as: .zip

Change History (12)

Changed 6 years ago by erny

don't convert from/to unicode during get/set in memcached

comment:1 Changed 5 years ago by joshuajonah

  • Cc josh@… added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 5 years ago by jefurii

  • Cc gjost@… added

comment:3 Changed 5 years ago by russellm

  • Has patch set
  • milestone set to 1.2
  • Needs tests set
  • Triage Stage changed from Unreviewed to Accepted

comment:4 Changed 5 years ago by russellm

#12178 was a dupe with a slightly different approach in the patch (manually specifying that the unicode conversion should be skipped).

comment:5 Changed 5 years ago by mcroydon

  • Owner changed from nobody to mcroydon
  • Status changed from new to assigned

I'm working on adding tests to this.

Changed 5 years ago by mcroydon

Updated patch and added test to confirm original failure and fix.

comment:6 Changed 5 years ago by mcroydon

  • Needs tests unset
  • Owner changed from mcroydon to nobody
  • Status changed from assigned to new

I created a new diff that includes the patch and a test. The new test fails before the patch is applied and all cache tests pass after it's applied. Tested with memcached 1.4.4 and python-memcached 1.4.5.

comment:7 Changed 5 years ago by mcroydon

Tests pass for me against dummy, file, and locmem backends as well.

comment:8 Changed 5 years ago by mcroydon

  • Cc mcroydon@… added

comment:9 Changed 5 years ago by jacob

  • Resolution set to fixed
  • Status changed from new to closed

(In [12637]) Fixed #11012: don't needless convert cache values to unicode.

This means you can now successfully store binary blogs, such as compressed data,
in the cache.

Thanks to Matt Croydon for the final patch.

comment:10 Changed 4 years ago by jacob

  • milestone 1.2 deleted

Milestone 1.2 deleted

Note: See TracTickets for help on using tickets.
Back to Top