Opened 6 years ago

Closed 5 years ago

#9180 closed (duplicate)

Low-level cache interface incorrectly tries to typecast bytestring

Reported by: p Owned by: nobody
Component: Core (Cache system) Version: 1.1
Severity: Keywords:
Cc: django.9180@…, gonz, lvscar, obeattie Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

The low-level Django cache API encodes basestrings as UTF-8 when setting, and decodes basestrings as Unicode when getting.

If you're trying to store a string of bytes in the cache -- for instance, the raw bytes of an image -- the set operation will possibly modify the data by encoding it, and the get operation will potentially raise a DjangoUnicodeDecodeError if the codec can't decode bytes in the string.

from django.core.cache import cache
from django.utils.encoding import DjangoUnicodeDecodeError

# The simplest possible GIF: a 43-byte, 1x1-pixel transparent image
EMPTY_GIF_BYTES = 'GIF89a\x01\x00\x01\x00\xf0\x00\x00\xb0\x8cZ\x00\x00\x00!\xf9\x04\x00\x00\x00\x00\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x02D\x01\x00;'

cache.set('empty_gif', EMPTY_GIF_BYTES)

try:
    cache.get('empty_gif')
except DjangoUnicodeDecodeError:
    print 'Tried to decode GIF bytestring as a Unicode string'
else:
    print 'Got the raw GIF bytestring'

A workaround is to create a one-tuple from the bytestring when storing, and when retrieving, returning the single element from the tuple.

def raw_cache_set(key, value, timeout_seconds=None):
    cache.set(key, (value,), timeout_seconds=timeout_seconds)

def raw_cache_get(key):
    value = cache.get(key)
    if value is not None:
        return value[0]

raw_cache_set('empty_gif', EMPTY_GIF_BYTES)

assert raw_cache_get('empty_gif') == EMPTY_GIF_BYTES

One possible fix would be to expose a raw keyword argument boolean to cache.get, cache.set, cache.add, cache.get_many that would conditionally skip any smart_unicode or other encoding/decoding logic when storing or retrieving from the cache.

Attachments (1)

9180.diff (1.7 KB) - added by AdamG 6 years ago.
Patch that fixes this issue, is backwards-compatible & includes a regression test

Download all attachments as: .zip

Change History (9)

comment:1 Changed 6 years ago by adrian

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 6 years ago by AdamG

See also #5589. I would greatly prefer the raw kwarg, as using the 1-tuple solution would mean that Python pickles are actually being cached, which would mean that the cache can only be primed by Python code - and when using something like memcached, that's not always the case. For example, I often put output from imagemagick into memcached; right now, that's innaccessible from django.core.cache.cache due to this issue.

comment:3 Changed 6 years ago by AdamG

  • Has patch set

I've tested 9180.diff using the memcache library; I'd appreciate it if someone could test it with cmemcache so we could get this to ready for checkin.

comment:4 Changed 6 years ago by AdamG

  • Cc django.9180@… added

Changed 6 years ago by AdamG

Patch that fixes this issue, is backwards-compatible & includes a regression test

comment:5 Changed 6 years ago by gonz

  • Cc gonz added

comment:6 Changed 5 years ago by lvscar

  • Cc lvscar added
  • Triage Stage changed from Accepted to Ready for checkin
  • Version changed from 1.0 to 1.1

django 1.1 lack the patch!

i patch it to django1.1 while cmemcache as my cache backend.
it make django.core.cache.cache.get method work great with pickled data

comment:7 Changed 5 years ago by obeattie

  • Cc obeattie added

Any update on the checkin status of this patch? The use case I came across this on was storing zlib-compressed data in the cache. Since that needs to be preserved as raw bytes, it's not possible to store and retrieve without further processing (thus lengthening the compressed data).

comment:8 Changed 5 years ago by obeattie

  • Resolution set to duplicate
  • Status changed from new to closed

Alright, guess this was already fixed in [12637]. Marking this as a duplicate of #11012, even though that's strictly a dupe of this.

Note: See TracTickets for help on using tickets.
Back to Top