Opened 11 years ago

Last modified 7 days ago

#19221 new Bug

Check that cache keys are string

Reported by: Mark Hughes Owned by:
Component: Core (Cache system) Version: dev
Severity: Normal Keywords:
Cc: Carl Meyer, dan@…, Ülgen Sarıkavak Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In current master cache keys can't be integers whilst they can in 1.4. Attempts to do this result in:

  File "/home/ismdj/src/django/django/core/cache/backends/dummy.py", line 15, in get
    key = self.make_key(key, version=version)
  File "/home/ismdj/src/django/django/core/cache/backends/base.py", line 80, in make_key
    new_key = self.key_func(key, self.key_prefix, version)
  File "/home/ismdj/src/django/django/core/cache/backends/base.py", line 26, in default_key_func
    return ':'.join([key_prefix, str(version), key])
TypeError: sequence item 2: expected string or Unicode, int found

The issue seems to have been introduced with this commit:

https://github.com/django/django/commit/45baaabafb6cf911afad9ec63c86753b284f7269

Change History (18)

comment:1 by Russell Keith-Magee, 11 years ago

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Since this is an apparent regression in behavior, I'm marking this as a release blocker.

Whether this is an actual bug will depend on whether we've ever contractually guaranteed that a cache key can be any type. I'm not necessarily convinced that an integer *should* be allowed to be a cache key (other than the fact that historically, it's apparently been allowed). So - the resolution here might be documenting the type guarantee, rather than fixing the 'bug'.

comment:2 by Aymeric Augustin, 11 years ago

This is another case of Django's source historically abusing smart_bytes, which makes integers work where they shouldn't. The docs don't promise anything and the cache keys are clearly intended to be strings.

I recently had the same problem in HttpResponse. I ended up implementing explicit support for integers because it was tested, but it's probably going to be deprecated in the long run.

In this case, I believe that cache.get(42) and cache.get('42') shouldn't be the same -- this isn't PHP! In Django 1.4, they are. I'm +0 on classifying this as a bugfix, and maybe documenting it as a backwards incompatible change.

comment:3 by Aymeric Augustin, 11 years ago

Anssi agrees with just documenting the change (on IRC).

comment:4 by Aymeric Augustin, 11 years ago

Upon further thought, we could take this opportunity to:

  • document that cache keys *must* be strings,
  • document how unicode and byte strings behave: at least with the memcached backend, a unicode string and the corresponding utf-8 bytestring map to the same cache key; it could be considered a bug that two non-equal Python objects map to the same cache key;
  • (maybe) normalize that behavior across cache backends.

Overall this would result in a simpler and more correct implementation — without smart_str or smart_bytes that often do more work than is desirable.

comment:5 by Aymeric Augustin, 11 years ago

One more thing: with the current code, cache keys have to be str on Python 3, not bytes. This is a rather annoying limitation.

comment:6 by Preston Holmes, 11 years ago

If we can easily coerce to a str at the right place, I don't see why we can't do this for users. Since we have a finite known set of components for the key - we should be able to do this with more robust string formatting instead of join.

comment:7 by Aymeric Augustin, 11 years ago

Since this is a release blocker for 1.5, we have to do something quickly; but I'm not ready to commit to supporting integers as cache keys in Django right now.

I'm not specifically against using integers as cache keys. I'm against different Python values mapping to the same cache key, and that's what happens currently with integers (as well as any non-string type):

>>> a, b = 42, "42"
>>> a == b
False
>>> cache.get(a)
>>> cache.set(b, "foobar")
>>> cache.get(a)
"foobar"

As a compromise, in the short term, I suggest this small change:

--- a/django/core/cache/backends/base.py
+++ b/django/core/cache/backends/base.py
@@ -26,7 +26,7 @@ def default_key_func(key, key_prefix, version):
     the `key_prefix'. KEY_FUNCTION can be used to specify an alternate
     function with custom key making behavior.
     """
-    return ':'.join([key_prefix, str(version), key])
+    return '%s:%s:%s' % (key_prefix, version, key)
 
 
 def get_key_func(key_func):

No tests, because tests create a promise of backwards-compatibility.

And then move the ticket to DDN.


This will make the cache work with any type of key that's convertible to a unicode string (unicode_literals is turned on in this module).

It isn't 100% backwards compatible: Django used to call smart_bytes, and with this change it will implicitly call six.text_type. But I'm strongly against using the smart_* or force_* here. These functions do much more work than necessary. They are only suitable for use on not-so-well-specified input, or in extreme cases (like the 500 view) where we want *something* representing *any* object (even an instance of UnicodeDecodeError).

comment:8 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

In 6c69de80bdcd2744bc64cb933c2d863dd5e74a33:

Tweaked cache key creation to avoid strict typing.

This is a provisional change. See #19221 for details.

comment:9 by Aymeric Augustin, 11 years ago

Severity: Release blockerNormal
Triage Stage: AcceptedDesign decision needed

comment:10 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

In 3db2aeec988f7bc9ad8a0844809a5bd4d7211669:

[1.5.x] Tweaked cache key creation to avoid strict typing.

This is a provisional change. See #19221 for details.

Backport of 6c69de8 from master.

comment:11 by Aymeric Augustin, 11 years ago

Summary: Cache keys can't be integersCheck that cache keys are string
Triage Stage: Design decision neededAccepted

Since no one had a better idea, let's do what I explained above.

comment:12 by Carl Meyer, 11 years ago

I agree that implicitly converting 42 and "42" to be the same cache key is bad, and we should migrate away from it with an appropriate deprecation path. (I think the patch to do so should take some care to "assume string and only do more work if that assumption fails" rather than "check pre-emptively", in order to preserve all possible speed in the common case.)

comment:13 by Tim Graham, 7 years ago

Cc: Carl Meyer added

Carl, I thought this might be a good task for a new-ish contributor but I'm not sure what the implementation of "assume string and only do more work if that assumption fails" might look like. It looks like you're suggesting not to use isinstance but I'm not sure what the alternative is.

comment:14 by Dan Stephenson, 7 years ago

Cc: dan@… added
Owner: changed from nobody to Dan Stephenson
Status: newassigned

comment:15 by Dan Stephenson, 7 years ago

Can i propose prepending the name of the keys object type, to fix the issue of keys 42 and "42" pointing at same cache data as it does currently.

return '%s:%s:%s_%s' % (key_prefix, version, type(key).__name__, key)

The challenge for this change would be in how we handle pre-existing cache keys after Django is upgraded.

  • We could add some code into the has_key function temporarily, so that in scenarios where it returns None, there is an additional check against the legacy format, if something is found, we would update the cache key (delete the old one), and return it.

or

  • Add an advisory to the release notes that the internal cache naming has changed, and to prevent orphaned keys taking up space, its recommended to flush existing caches alongside the version upgrade.

comments welcome!

Last edited 7 years ago by Dan Stephenson (previous) (diff)

comment:16 by Balazs Endresz, 4 years ago

The default_key_func is still the old version in the docs, which fails with any non-string arguments: https://github.com/django/django/blob/stable/3.0.x/docs/ref/settings.txt#L171

Btw, not just the key but the key_prefix can be non-string too. What tripped me up was this in the test settings: CACHES['default']['KEY_PREFIX'] = random() when used with the KEY_PREFIX code snippet from the docs.

comment:17 by Mariusz Felisiak, 2 years ago

Owner: Dan Stephenson removed
Status: assignednew

comment:18 by Ülgen Sarıkavak, 7 days ago

Cc: Ülgen Sarıkavak added
Note: See TracTickets for help on using tickets.
Back to Top