Code

Opened 18 months ago

Last modified 14 months ago

#19221 new Bug

Check that cache keys are string

Reported by: mhsparks Owned by: nobody
Component: Core (Cache system) Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In current master cache keys can't be integers whilst they can in 1.4. Attempts to do this result in:

  File "/home/ismdj/src/django/django/core/cache/backends/dummy.py", line 15, in get
    key = self.make_key(key, version=version)
  File "/home/ismdj/src/django/django/core/cache/backends/base.py", line 80, in make_key
    new_key = self.key_func(key, self.key_prefix, version)
  File "/home/ismdj/src/django/django/core/cache/backends/base.py", line 26, in default_key_func
    return ':'.join([key_prefix, str(version), key])
TypeError: sequence item 2: expected string or Unicode, int found

The issue seems to have been introduced with this commit:

https://github.com/django/django/commit/45baaabafb6cf911afad9ec63c86753b284f7269

Attachments (0)

Change History (12)

comment:1 Changed 18 months ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Severity changed from Normal to Release blocker
  • Triage Stage changed from Unreviewed to Accepted

Since this is an apparent regression in behavior, I'm marking this as a release blocker.

Whether this is an actual bug will depend on whether we've ever contractually guaranteed that a cache key can be any type. I'm not necessarily convinced that an integer *should* be allowed to be a cache key (other than the fact that historically, it's apparently been allowed). So - the resolution here might be documenting the type guarantee, rather than fixing the 'bug'.

comment:2 Changed 18 months ago by aaugustin

This is another case of Django's source historically abusing smart_bytes, which makes integers work where they shouldn't. The docs don't promise anything and the cache keys are clearly intended to be strings.

I recently had the same problem in HttpResponse. I ended up implementing explicit support for integers because it was tested, but it's probably going to be deprecated in the long run.

In this case, I believe that cache.get(42) and cache.get('42') shouldn't be the same -- this isn't PHP! In Django 1.4, they are. I'm +0 on classifying this as a bugfix, and maybe documenting it as a backwards incompatible change.

comment:3 Changed 18 months ago by aaugustin

Anssi agrees with just documenting the change (on IRC).

comment:4 Changed 18 months ago by aaugustin

Upon further thought, we could take this opportunity to:

  • document that cache keys *must* be strings,
  • document how unicode and byte strings behave: at least with the memcached backend, a unicode string and the corresponding utf-8 bytestring map to the same cache key; it could be considered a bug that two non-equal Python objects map to the same cache key;
  • (maybe) normalize that behavior across cache backends.

Overall this would result in a simpler and more correct implementation — without smart_str or smart_bytes that often do more work than is desirable.

comment:5 Changed 18 months ago by aaugustin

One more thing: with the current code, cache keys have to be str on Python 3, not bytes. This is a rather annoying limitation.

comment:6 Changed 18 months ago by ptone

If we can easily coerce to a str at the right place, I don't see why we can't do this for users. Since we have a finite known set of components for the key - we should be able to do this with more robust string formatting instead of join.

comment:7 Changed 17 months ago by aaugustin

Since this is a release blocker for 1.5, we have to do something quickly; but I'm not ready to commit to supporting integers as cache keys in Django right now.

I'm not specifically against using integers as cache keys. I'm against different Python values mapping to the same cache key, and that's what happens currently with integers (as well as any non-string type):

>>> a, b = 42, "42"
>>> a == b
False
>>> cache.get(a)
>>> cache.set(b, "foobar")
>>> cache.get(a)
"foobar"

As a compromise, in the short term, I suggest this small change:

--- a/django/core/cache/backends/base.py
+++ b/django/core/cache/backends/base.py
@@ -26,7 +26,7 @@ def default_key_func(key, key_prefix, version):
     the `key_prefix'. KEY_FUNCTION can be used to specify an alternate
     function with custom key making behavior.
     """
-    return ':'.join([key_prefix, str(version), key])
+    return '%s:%s:%s' % (key_prefix, version, key)
 
 
 def get_key_func(key_func):

No tests, because tests create a promise of backwards-compatibility.

And then move the ticket to DDN.


This will make the cache work with any type of key that's convertible to a unicode string (unicode_literals is turned on in this module).

It isn't 100% backwards compatible: Django used to call smart_bytes, and with this change it will implicitly call six.text_type. But I'm strongly against using the smart_* or force_* here. These functions do much more work than necessary. They are only suitable for use on not-so-well-specified input, or in extreme cases (like the 500 view) where we want *something* representing *any* object (even an instance of UnicodeDecodeError).

comment:8 Changed 17 months ago by Aymeric Augustin <aymeric.augustin@…>

In 6c69de80bdcd2744bc64cb933c2d863dd5e74a33:

Tweaked cache key creation to avoid strict typing.

This is a provisional change. See #19221 for details.

comment:9 Changed 17 months ago by aaugustin

  • Severity changed from Release blocker to Normal
  • Triage Stage changed from Accepted to Design decision needed

comment:10 Changed 17 months ago by Aymeric Augustin <aymeric.augustin@…>

In 3db2aeec988f7bc9ad8a0844809a5bd4d7211669:

[1.5.x] Tweaked cache key creation to avoid strict typing.

This is a provisional change. See #19221 for details.

Backport of 6c69de8 from master.

comment:11 Changed 14 months ago by aaugustin

  • Summary changed from Cache keys can't be integers to Check that cache keys are string
  • Triage Stage changed from Design decision needed to Accepted

Since no one had a better idea, let's do what I explained above.

comment:12 Changed 14 months ago by carljm

I agree that implicitly converting 42 and "42" to be the same cache key is bad, and we should migrate away from it with an appropriate deprecation path. (I think the patch to do so should take some care to "assume string and only do more work if that assumption fails" rather than "check pre-emptively", in order to preserve all possible speed in the common case.)

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from nobody to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.