Opened 4 years ago

Last modified 7 days ago

#19221 assigned Bug

Check that cache keys are string

Reported by: Mark Hughes Owned by: Dan Stephenson
Component: Core (Cache system) Version: master
Severity: Normal Keywords:
Cc: Carl Meyer, dan@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In current master cache keys can't be integers whilst they can in 1.4. Attempts to do this result in:

  File "/home/ismdj/src/django/django/core/cache/backends/dummy.py", line 15, in get
    key = self.make_key(key, version=version)
  File "/home/ismdj/src/django/django/core/cache/backends/base.py", line 80, in make_key
    new_key = self.key_func(key, self.key_prefix, version)
  File "/home/ismdj/src/django/django/core/cache/backends/base.py", line 26, in default_key_func
    return ':'.join([key_prefix, str(version), key])
TypeError: sequence item 2: expected string or Unicode, int found

The issue seems to have been introduced with this commit:

https://github.com/django/django/commit/45baaabafb6cf911afad9ec63c86753b284f7269

Change History (15)

comment:1 Changed 4 years ago by Russell Keith-Magee

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Since this is an apparent regression in behavior, I'm marking this as a release blocker.

Whether this is an actual bug will depend on whether we've ever contractually guaranteed that a cache key can be any type. I'm not necessarily convinced that an integer *should* be allowed to be a cache key (other than the fact that historically, it's apparently been allowed). So - the resolution here might be documenting the type guarantee, rather than fixing the 'bug'.

comment:2 Changed 4 years ago by Aymeric Augustin

This is another case of Django's source historically abusing smart_bytes, which makes integers work where they shouldn't. The docs don't promise anything and the cache keys are clearly intended to be strings.

I recently had the same problem in HttpResponse. I ended up implementing explicit support for integers because it was tested, but it's probably going to be deprecated in the long run.

In this case, I believe that cache.get(42) and cache.get('42') shouldn't be the same -- this isn't PHP! In Django 1.4, they are. I'm +0 on classifying this as a bugfix, and maybe documenting it as a backwards incompatible change.

comment:3 Changed 4 years ago by Aymeric Augustin

Anssi agrees with just documenting the change (on IRC).

comment:4 Changed 4 years ago by Aymeric Augustin

Upon further thought, we could take this opportunity to:

  • document that cache keys *must* be strings,
  • document how unicode and byte strings behave: at least with the memcached backend, a unicode string and the corresponding utf-8 bytestring map to the same cache key; it could be considered a bug that two non-equal Python objects map to the same cache key;
  • (maybe) normalize that behavior across cache backends.

Overall this would result in a simpler and more correct implementation — without smart_str or smart_bytes that often do more work than is desirable.

comment:5 Changed 4 years ago by Aymeric Augustin

One more thing: with the current code, cache keys have to be str on Python 3, not bytes. This is a rather annoying limitation.

comment:6 Changed 4 years ago by Preston Holmes

If we can easily coerce to a str at the right place, I don't see why we can't do this for users. Since we have a finite known set of components for the key - we should be able to do this with more robust string formatting instead of join.

comment:7 Changed 4 years ago by Aymeric Augustin

Since this is a release blocker for 1.5, we have to do something quickly; but I'm not ready to commit to supporting integers as cache keys in Django right now.

I'm not specifically against using integers as cache keys. I'm against different Python values mapping to the same cache key, and that's what happens currently with integers (as well as any non-string type):

>>> a, b = 42, "42"
>>> a == b
False
>>> cache.get(a)
>>> cache.set(b, "foobar")
>>> cache.get(a)
"foobar"

As a compromise, in the short term, I suggest this small change:

--- a/django/core/cache/backends/base.py
+++ b/django/core/cache/backends/base.py
@@ -26,7 +26,7 @@ def default_key_func(key, key_prefix, version):
     the `key_prefix'. KEY_FUNCTION can be used to specify an alternate
     function with custom key making behavior.
     """
-    return ':'.join([key_prefix, str(version), key])
+    return '%s:%s:%s' % (key_prefix, version, key)
 
 
 def get_key_func(key_func):

No tests, because tests create a promise of backwards-compatibility.

And then move the ticket to DDN.


This will make the cache work with any type of key that's convertible to a unicode string (unicode_literals is turned on in this module).

It isn't 100% backwards compatible: Django used to call smart_bytes, and with this change it will implicitly call six.text_type. But I'm strongly against using the smart_* or force_* here. These functions do much more work than necessary. They are only suitable for use on not-so-well-specified input, or in extreme cases (like the 500 view) where we want *something* representing *any* object (even an instance of UnicodeDecodeError).

comment:8 Changed 4 years ago by Aymeric Augustin <aymeric.augustin@…>

In 6c69de80bdcd2744bc64cb933c2d863dd5e74a33:

Tweaked cache key creation to avoid strict typing.

This is a provisional change. See #19221 for details.

comment:9 Changed 4 years ago by Aymeric Augustin

Severity: Release blockerNormal
Triage Stage: AcceptedDesign decision needed

comment:10 Changed 4 years ago by Aymeric Augustin <aymeric.augustin@…>

In 3db2aeec988f7bc9ad8a0844809a5bd4d7211669:

[1.5.x] Tweaked cache key creation to avoid strict typing.

This is a provisional change. See #19221 for details.

Backport of 6c69de8 from master.

comment:11 Changed 4 years ago by Aymeric Augustin

Summary: Cache keys can't be integersCheck that cache keys are string
Triage Stage: Design decision neededAccepted

Since no one had a better idea, let's do what I explained above.

comment:12 Changed 4 years ago by Carl Meyer

I agree that implicitly converting 42 and "42" to be the same cache key is bad, and we should migrate away from it with an appropriate deprecation path. (I think the patch to do so should take some care to "assume string and only do more work if that assumption fails" rather than "check pre-emptively", in order to preserve all possible speed in the common case.)

comment:13 Changed 4 weeks ago by Tim Graham

Cc: Carl Meyer added

Carl, I thought this might be a good task for a new-ish contributor but I'm not sure what the implementation of "assume string and only do more work if that assumption fails" might look like. It looks like you're suggesting not to use isinstance but I'm not sure what the alternative is.

comment:14 Changed 7 days ago by Dan Stephenson

Cc: dan@… added
Owner: changed from nobody to Dan Stephenson
Status: newassigned

comment:15 Changed 7 days ago by Dan Stephenson

Can i propose prepending the name of the keys object type, to fix the issue of keys 42 and "42" pointing at same cache data as it does currently.

return '%s:%s:%s_%s' % (key_prefix, version, type(key).__name__, key)

The challenge for this change would be in how we handle pre-existing cache keys after Django is upgraded.

  • We could add some code into the has_key function temporarily, so that in scenarios where it returns None, there is an additional check against the legacy format, if something is found, we would update the cache key (delete the old one), and return it.

or

  • Add an advisory to the release notes that the internal cache naming has changed, and to prevent orphaned keys taking up space, its recommended to flush existing caches alongside the version upgrade.

comments welcome!

Last edited 7 days ago by Dan Stephenson (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top