Opened 16 years ago

Closed 15 years ago

#9180 closed (duplicate)

Low-level cache interface incorrectly tries to typecast bytestring

Reported by: Paul Smith Owned by: nobody
Component: Core (Cache system) Version: 1.1
Severity: Keywords:
Cc: django.9180@…, Gonzalo Saavedra, lvscar, Oliver Beattie Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The low-level Django cache API encodes basestrings as UTF-8 when setting, and decodes basestrings as Unicode when getting.

If you're trying to store a string of bytes in the cache -- for instance, the raw bytes of an image -- the set operation will possibly modify the data by encoding it, and the get operation will potentially raise a DjangoUnicodeDecodeError if the codec can't decode bytes in the string.

from django.core.cache import cache
from django.utils.encoding import DjangoUnicodeDecodeError

# The simplest possible GIF: a 43-byte, 1x1-pixel transparent image
EMPTY_GIF_BYTES = 'GIF89a\x01\x00\x01\x00\xf0\x00\x00\xb0\x8cZ\x00\x00\x00!\xf9\x04\x00\x00\x00\x00\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x02D\x01\x00;'

cache.set('empty_gif', EMPTY_GIF_BYTES)

try:
    cache.get('empty_gif')
except DjangoUnicodeDecodeError:
    print 'Tried to decode GIF bytestring as a Unicode string'
else:
    print 'Got the raw GIF bytestring'

A workaround is to create a one-tuple from the bytestring when storing, and when retrieving, returning the single element from the tuple.

def raw_cache_set(key, value, timeout_seconds=None):
    cache.set(key, (value,), timeout_seconds=timeout_seconds)

def raw_cache_get(key):
    value = cache.get(key)
    if value is not None:
        return value[0]

raw_cache_set('empty_gif', EMPTY_GIF_BYTES)

assert raw_cache_get('empty_gif') == EMPTY_GIF_BYTES

One possible fix would be to expose a raw keyword argument boolean to cache.get, cache.set, cache.add, cache.get_many that would conditionally skip any smart_unicode or other encoding/decoding logic when storing or retrieving from the cache.

Attachments (1)

9180.diff (1.7 KB ) - added by Adam Gomaa 16 years ago.
Patch that fixes this issue, is backwards-compatible & includes a regression test

Download all attachments as: .zip

Change History (9)

comment:1 by Adrian Holovaty, 16 years ago

Triage Stage: UnreviewedAccepted

comment:2 by Adam Gomaa, 16 years ago

See also #5589. I would greatly prefer the raw kwarg, as using the 1-tuple solution would mean that Python pickles are actually being cached, which would mean that the cache can only be primed by Python code - and when using something like memcached, that's not always the case. For example, I often put output from imagemagick into memcached; right now, that's innaccessible from django.core.cache.cache due to this issue.

comment:3 by Adam Gomaa, 16 years ago

Has patch: set

I've tested 9180.diff using the memcache library; I'd appreciate it if someone could test it with cmemcache so we could get this to ready for checkin.

comment:4 by Adam Gomaa, 16 years ago

Cc: django.9180@… added

by Adam Gomaa, 16 years ago

Attachment: 9180.diff added

Patch that fixes this issue, is backwards-compatible & includes a regression test

comment:5 by Gonzalo Saavedra, 15 years ago

Cc: Gonzalo Saavedra added

comment:6 by lvscar, 15 years ago

Cc: lvscar added
Triage Stage: AcceptedReady for checkin
Version: 1.01.1

django 1.1 lack the patch!

i patch it to django1.1 while cmemcache as my cache backend.
it make django.core.cache.cache.get method work great with pickled data

comment:7 by Oliver Beattie, 15 years ago

Cc: Oliver Beattie added

Any update on the checkin status of this patch? The use case I came across this on was storing zlib-compressed data in the cache. Since that needs to be preserved as raw bytes, it's not possible to store and retrieve without further processing (thus lengthening the compressed data).

comment:8 by Oliver Beattie, 15 years ago

Resolution: duplicate
Status: newclosed

Alright, guess this was already fixed in [12637]. Marking this as a duplicate of #11012, even though that's strictly a dupe of this.

Note: See TracTickets for help on using tickets.
Back to Top