Opened 13 years ago

Closed 13 days ago

Last modified 8 days ago

#19221 closed Bug (fixed)

Cache keys can't be integers

Reported by: Mark Hughes Owned by: Abhimanyu Singh Negi
Component: Core (Cache system) Version: dev
Severity: Normal Keywords:
Cc: Carl Meyer, dan@…, Ülgen Sarıkavak Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In current master cache keys can't be integers whilst they can in 1.4. Attempts to do this result in:

  File "/home/ismdj/src/django/django/core/cache/backends/dummy.py", line 15, in get
    key = self.make_key(key, version=version)
  File "/home/ismdj/src/django/django/core/cache/backends/base.py", line 80, in make_key
    new_key = self.key_func(key, self.key_prefix, version)
  File "/home/ismdj/src/django/django/core/cache/backends/base.py", line 26, in default_key_func
    return ':'.join([key_prefix, str(version), key])
TypeError: sequence item 2: expected string or Unicode, int found

The issue seems to have been introduced with this commit:

https://github.com/django/django/commit/45baaabafb6cf911afad9ec63c86753b284f7269

Change History (29)

comment:1 by Russell Keith-Magee, 13 years ago

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Since this is an apparent regression in behavior, I'm marking this as a release blocker.

Whether this is an actual bug will depend on whether we've ever contractually guaranteed that a cache key can be any type. I'm not necessarily convinced that an integer *should* be allowed to be a cache key (other than the fact that historically, it's apparently been allowed). So - the resolution here might be documenting the type guarantee, rather than fixing the 'bug'.

comment:2 by Aymeric Augustin, 13 years ago

This is another case of Django's source historically abusing smart_bytes, which makes integers work where they shouldn't. The docs don't promise anything and the cache keys are clearly intended to be strings.

I recently had the same problem in HttpResponse. I ended up implementing explicit support for integers because it was tested, but it's probably going to be deprecated in the long run.

In this case, I believe that cache.get(42) and cache.get('42') shouldn't be the same -- this isn't PHP! In Django 1.4, they are. I'm +0 on classifying this as a bugfix, and maybe documenting it as a backwards incompatible change.

comment:3 by Aymeric Augustin, 13 years ago

Anssi agrees with just documenting the change (on IRC).

comment:4 by Aymeric Augustin, 13 years ago

Upon further thought, we could take this opportunity to:

  • document that cache keys *must* be strings,
  • document how unicode and byte strings behave: at least with the memcached backend, a unicode string and the corresponding utf-8 bytestring map to the same cache key; it could be considered a bug that two non-equal Python objects map to the same cache key;
  • (maybe) normalize that behavior across cache backends.

Overall this would result in a simpler and more correct implementation — without smart_str or smart_bytes that often do more work than is desirable.

comment:5 by Aymeric Augustin, 13 years ago

One more thing: with the current code, cache keys have to be str on Python 3, not bytes. This is a rather annoying limitation.

comment:6 by Preston Holmes, 13 years ago

If we can easily coerce to a str at the right place, I don't see why we can't do this for users. Since we have a finite known set of components for the key - we should be able to do this with more robust string formatting instead of join.

comment:7 by Aymeric Augustin, 13 years ago

Since this is a release blocker for 1.5, we have to do something quickly; but I'm not ready to commit to supporting integers as cache keys in Django right now.

I'm not specifically against using integers as cache keys. I'm against different Python values mapping to the same cache key, and that's what happens currently with integers (as well as any non-string type):

>>> a, b = 42, "42"
>>> a == b
False
>>> cache.get(a)
>>> cache.set(b, "foobar")
>>> cache.get(a)
"foobar"

As a compromise, in the short term, I suggest this small change:

--- a/django/core/cache/backends/base.py
+++ b/django/core/cache/backends/base.py
@@ -26,7 +26,7 @@ def default_key_func(key, key_prefix, version):
     the `key_prefix'. KEY_FUNCTION can be used to specify an alternate
     function with custom key making behavior.
     """
-    return ':'.join([key_prefix, str(version), key])
+    return '%s:%s:%s' % (key_prefix, version, key)
 
 
 def get_key_func(key_func):

No tests, because tests create a promise of backwards-compatibility.

And then move the ticket to DDN.


This will make the cache work with any type of key that's convertible to a unicode string (unicode_literals is turned on in this module).

It isn't 100% backwards compatible: Django used to call smart_bytes, and with this change it will implicitly call six.text_type. But I'm strongly against using the smart_* or force_* here. These functions do much more work than necessary. They are only suitable for use on not-so-well-specified input, or in extreme cases (like the 500 view) where we want *something* representing *any* object (even an instance of UnicodeDecodeError).

comment:8 by Aymeric Augustin <aymeric.augustin@…>, 13 years ago

In 6c69de80bdcd2744bc64cb933c2d863dd5e74a33:

Tweaked cache key creation to avoid strict typing.

This is a provisional change. See #19221 for details.

comment:9 by Aymeric Augustin, 13 years ago

Severity: Release blockerNormal
Triage Stage: AcceptedDesign decision needed

comment:10 by Aymeric Augustin <aymeric.augustin@…>, 13 years ago

In 3db2aeec988f7bc9ad8a0844809a5bd4d7211669:

[1.5.x] Tweaked cache key creation to avoid strict typing.

This is a provisional change. See #19221 for details.

Backport of 6c69de8 from master.

comment:11 by Aymeric Augustin, 13 years ago

Summary: Cache keys can't be integersCheck that cache keys are string
Triage Stage: Design decision neededAccepted

Since no one had a better idea, let's do what I explained above.

comment:12 by Carl Meyer, 13 years ago

I agree that implicitly converting 42 and "42" to be the same cache key is bad, and we should migrate away from it with an appropriate deprecation path. (I think the patch to do so should take some care to "assume string and only do more work if that assumption fails" rather than "check pre-emptively", in order to preserve all possible speed in the common case.)

comment:13 by Tim Graham, 9 years ago

Cc: Carl Meyer added

Carl, I thought this might be a good task for a new-ish contributor but I'm not sure what the implementation of "assume string and only do more work if that assumption fails" might look like. It looks like you're suggesting not to use isinstance but I'm not sure what the alternative is.

comment:14 by Dan Stephenson, 9 years ago

Cc: dan@… added
Owner: changed from nobody to Dan Stephenson
Status: newassigned

comment:15 by Dan Stephenson, 9 years ago

Can i propose prepending the name of the keys object type, to fix the issue of keys 42 and "42" pointing at same cache data as it does currently.

return '%s:%s:%s_%s' % (key_prefix, version, type(key).__name__, key)

The challenge for this change would be in how we handle pre-existing cache keys after Django is upgraded.

  • We could add some code into the has_key function temporarily, so that in scenarios where it returns None, there is an additional check against the legacy format, if something is found, we would update the cache key (delete the old one), and return it.

or

  • Add an advisory to the release notes that the internal cache naming has changed, and to prevent orphaned keys taking up space, its recommended to flush existing caches alongside the version upgrade.

comments welcome!

Last edited 9 years ago by Dan Stephenson (previous) (diff)

comment:16 by Balazs Endresz, 6 years ago

The default_key_func is still the old version in the docs, which fails with any non-string arguments: https://github.com/django/django/blob/stable/3.0.x/docs/ref/settings.txt#L171

Btw, not just the key but the key_prefix can be non-string too. What tripped me up was this in the test settings: CACHES['default']['KEY_PREFIX'] = random() when used with the KEY_PREFIX code snippet from the docs.

comment:17 by Mariusz Felisiak, 4 years ago

Owner: Dan Stephenson removed
Status: assignednew

comment:18 by Ülgen Sarıkavak, 2 years ago

Cc: Ülgen Sarıkavak added

comment:19 by DKPHUONG, 2 years ago

Owner: set to DKPHUONG
Status: newassigned

comment:20 by DKPHUONG, 2 years ago

Owner: DKPHUONG removed
Status: assignednew

comment:21 by Abhimanyu Singh Negi, 3 weeks ago

Owner: set to Abhimanyu Singh Negi
Status: newassigned

I'd like to work on this issue assigning to myself

Last edited 3 weeks ago by Abhimanyu Singh Negi (previous) (diff)

comment:22 by Abhimanyu Singh Negi, 3 weeks ago

Has patch: set

I’ve submitted a fix with regression tests:
https://github.com/django/django/pull/20629

comment:23 by Mariusz Felisiak, 3 weeks ago

Patch needs improvement: set

comment:24 by Abhimanyu Singh Negi, 3 weeks ago

I’ve submitted an new PR implementing a deprecation-based migration path:
https://github.com/django/django/pull/20641

in reply to:  23 comment:25 by Abhimanyu Singh Negi, 3 weeks ago

Replying to Mariusz Felisiak:

Updated PR based on earlier feedback. All tests pass and ready for review.
https://github.com/django/django/pull/20641

comment:26 by Abhimanyu Singh Negi, 2 weeks ago

updated pr to use standard django depriciation procedure as suggested by reviewer (medmunds), ready for next review (https://github.com/django/django/pull/20641)

in reply to:  16 comment:27 by Mike Edmunds, 13 days ago

Patch needs improvement: unset
Resolution: fixed
Status: assignedclosed
Summary: Check that cache keys are stringCache keys can't be integers

I am closing this ticket as "fixed." The original problem was resolved by the "provisional" changes in comment:8, which in the intervening 13 years1 have become Django's de facto standard.

In practice, Django does allow non-string cache keys (and key prefixes), and its default KEY_FUNCTION will coerce those to strings if possible or error if not. It's maybe not ideal that 42 and "42" become the same cache item, but that's how it's worked for over a decade (without additional bug reports that I could find).

There are likely projects using bytes or other non-str-coercible key types with custom key functions. Any plan to deprecate non-str keys would need to account for those uses. (If there's still a desire to restrict keys to strings, a Django Forum discussion or new-features proposal is probably the best way to proceed.)

The docs inconsistency Balazs Endresz pointed out in comment:16 (in the settings reference vs caches topic) is still there in the 6.0 docs. I've opened a "Refs #19221" PR #20695 to clean that up.


1 Also in the past 13 years: Django has dropped support for Python 2.7 (so str and bytes have stabilized), Python has added f-strings (making implicit str coercion commonplace—without Python turning into PHP 😀), Django's process for design decisions has evolved multiple times, and this ticket has become an attractive nuisance for prospective first-time contributors.

comment:28 by Mike Edmunds, 8 days ago

From discussion on GitHub, if this is still a problem we could try to address it in the cache docs:

  • Note that the default key function has the (potentially surprising) behavior of treating 42 and "42" as equivalent cache keys.
  • Suggest using a custom KEY_FUNCTION if that default behavior causes problems, and show Dan Stephenson's approach from comment:15 (that includes the key type) as an example.

(If there's interest in that change, it might be easiest to handle it as a new Documentation ticket that references this one.)

comment:29 by GitHub <noreply@…>, 8 days ago

In 4513a6c:

Refs #19221 -- Fixed outdated KEY_FUNCTION definition in docs/ref/settings.txt.

Replaced outdated version of default_key_func in settings reference
with pointer to current version in cache topic. Rewrote description to
match parameter order and behavior of default implementation.

Co-authored-by: nessita <124304+nessita@…>

Note: See TracTickets for help on using tickets.
Back to Top