Opened 12 years ago
Last modified 8 months ago
#19221 new Bug
Check that cache keys are string
Reported by: | Mark Hughes | Owned by: | |
---|---|---|---|
Component: | Core (Cache system) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Carl Meyer, dan@…, Ülgen Sarıkavak | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In current master cache keys can't be integers whilst they can in 1.4. Attempts to do this result in:
File "/home/ismdj/src/django/django/core/cache/backends/dummy.py", line 15, in get key = self.make_key(key, version=version) File "/home/ismdj/src/django/django/core/cache/backends/base.py", line 80, in make_key new_key = self.key_func(key, self.key_prefix, version) File "/home/ismdj/src/django/django/core/cache/backends/base.py", line 26, in default_key_func return ':'.join([key_prefix, str(version), key]) TypeError: sequence item 2: expected string or Unicode, int found
The issue seems to have been introduced with this commit:
https://github.com/django/django/commit/45baaabafb6cf911afad9ec63c86753b284f7269
Change History (20)
comment:1 by , 12 years ago
Severity: | Normal → Release blocker |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 12 years ago
This is another case of Django's source historically abusing smart_bytes
, which makes integers work where they shouldn't. The docs don't promise anything and the cache keys are clearly intended to be strings.
I recently had the same problem in HttpResponse
. I ended up implementing explicit support for integers because it was tested, but it's probably going to be deprecated in the long run.
In this case, I believe that cache.get(42)
and cache.get('42')
shouldn't be the same -- this isn't PHP! In Django 1.4, they are. I'm +0 on classifying this as a bugfix, and maybe documenting it as a backwards incompatible change.
comment:4 by , 12 years ago
Upon further thought, we could take this opportunity to:
- document that cache keys *must* be strings,
- document how unicode and byte strings behave: at least with the memcached backend, a unicode string and the corresponding utf-8 bytestring map to the same cache key; it could be considered a bug that two non-equal Python objects map to the same cache key;
- (maybe) normalize that behavior across cache backends.
Overall this would result in a simpler and more correct implementation — without smart_str
or smart_bytes
that often do more work than is desirable.
comment:5 by , 12 years ago
One more thing: with the current code, cache keys have to be str
on Python 3, not bytes
. This is a rather annoying limitation.
comment:6 by , 12 years ago
If we can easily coerce to a str at the right place, I don't see why we can't do this for users. Since we have a finite known set of components for the key - we should be able to do this with more robust string formatting instead of join.
comment:7 by , 12 years ago
Since this is a release blocker for 1.5, we have to do something quickly; but I'm not ready to commit to supporting integers as cache keys in Django right now.
I'm not specifically against using integers as cache keys. I'm against different Python values mapping to the same cache key, and that's what happens currently with integers (as well as any non-string type):
>>> a, b = 42, "42" >>> a == b False >>> cache.get(a) >>> cache.set(b, "foobar") >>> cache.get(a) "foobar"
As a compromise, in the short term, I suggest this small change:
--- a/django/core/cache/backends/base.py +++ b/django/core/cache/backends/base.py @@ -26,7 +26,7 @@ def default_key_func(key, key_prefix, version): the `key_prefix'. KEY_FUNCTION can be used to specify an alternate function with custom key making behavior. """ - return ':'.join([key_prefix, str(version), key]) + return '%s:%s:%s' % (key_prefix, version, key) def get_key_func(key_func):
No tests, because tests create a promise of backwards-compatibility.
And then move the ticket to DDN.
This will make the cache work with any type of key that's convertible to a unicode string (unicode_literals
is turned on in this module).
It isn't 100% backwards compatible: Django used to call smart_bytes
, and with this change it will implicitly call six.text_type
. But I'm strongly against using the smart_*
or force_*
here. These functions do much more work than necessary. They are only suitable for use on not-so-well-specified input, or in extreme cases (like the 500 view) where we want *something* representing *any* object (even an instance of UnicodeDecodeError
).
comment:9 by , 12 years ago
Severity: | Release blocker → Normal |
---|---|
Triage Stage: | Accepted → Design decision needed |
comment:11 by , 12 years ago
Summary: | Cache keys can't be integers → Check that cache keys are string |
---|---|
Triage Stage: | Design decision needed → Accepted |
Since no one had a better idea, let's do what I explained above.
comment:12 by , 12 years ago
I agree that implicitly converting 42
and "42"
to be the same cache key is bad, and we should migrate away from it with an appropriate deprecation path. (I think the patch to do so should take some care to "assume string and only do more work if that assumption fails" rather than "check pre-emptively", in order to preserve all possible speed in the common case.)
comment:13 by , 8 years ago
Cc: | added |
---|
Carl, I thought this might be a good task for a new-ish contributor but I'm not sure what the implementation of "assume string and only do more work if that assumption fails" might look like. It looks like you're suggesting not to use isinstance
but I'm not sure what the alternative is.
comment:14 by , 8 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:15 by , 8 years ago
Can i propose prepending the name of the keys object type, to fix the issue of 42
and "42"
pointing at same data as it does currently.
return '%s:%s:%s_%s' % (key_prefix, version, type(key).__name__, key)
The challenge for this change would be in how we handle pre-existing cache keys after Django is upgraded.
- We could add some code into the has_key function temporarily, so that in scenarios where it returns None, there is an additional check against the legacy format, if something is found, we would update the cache key (delete the old one), and return it.
or
- Add an advisory to the release notes that the internal cache naming has changed, and to prevent orphaned keys taking up space, its recommended to flush existing caches alongside the version upgrade.
comments welcome!
comment:16 by , 5 years ago
The default_key_func
is still the old version in the docs, which fails with any non-string arguments: https://github.com/django/django/blob/stable/3.0.x/docs/ref/settings.txt#L171
Btw, not just the key
but the key_prefix
can be non-string too. What tripped me up was this in the test settings: CACHES['default']['KEY_PREFIX'] = random()
when used with the KEY_PREFIX
code snippet from the docs.
comment:17 by , 3 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:18 by , 8 months ago
Cc: | added |
---|
comment:19 by , 8 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:20 by , 8 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
Since this is an apparent regression in behavior, I'm marking this as a release blocker.
Whether this is an actual bug will depend on whether we've ever contractually guaranteed that a cache key can be any type. I'm not necessarily convinced that an integer *should* be allowed to be a cache key (other than the fact that historically, it's apparently been allowed). So - the resolution here might be documenting the type guarantee, rather than fixing the 'bug'.