Code

Opened 6 years ago

Closed 6 years ago

Last modified 3 years ago

#7460 closed (fixed)

cache tag calculates invalid key for memcached

Reported by: trbs Owned by: jacob
Component: Core (Cache system) Version: master
Severity: Keywords: memcached, space, problem, with, in, key
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

The cache node from the cache tag calculates it's key from the fragment_name appended by all the vary-on keys in the context joined together by ':'

 cache_key = u':'.join([self.fragment_name] + \
            [force_unicode(resolve_variable(var, context)) for var in self.vary_on])

This however can lead to a situation where one uses an object name for one of the vary-on fields which returns a string with a space in it.
For example:

{% load cache %}
{% cache 7200 sidebar blogname category %}
...
{% endcache %}

Where category is the data model object for a category. As the same peace of template (DRY) is also used on the frontpage and other pages one cannot be sure that category is a valid object so you cannot use 'category.slug' (which will likely be the default answer to this ticket) Cause this will raise an None has no attribute 'slug' error.

But since category yields a string value there's can be a space in there which memcached does not like for it's keys.

[ERROR@1213562932.703199] mcm_validate_key_func():3443: memcache(4) protocol error: isspace(3) returned true for character in key

In this example the key will become: "sidebar:myweblog:Some Category" where the space between some and category will be the problem yielding the memcache error above.

The attached patch modifies the cache templatetag to guaranty that the key does not contain spaces. This seemed to be the best place, as the cache tag is used from the a template and not from code.

I've also thought about maybe handling this in the cache backend for memcached, which at first hand seemed another valid place as that also converts any non-ascii key to ascii ignoring all other characters. But i don't like a backend to start changing things for the user which can lead to unexpected results.

(Granted this already is the case in the memcached backend because of .encode('ascii', 'ignore') which will break an ignorant users caching when he using non-ascii characters in the key string. For example, when using localized texts as key strings. But that's another matter :) )

Attachments (5)

cache_replace_space_with_underscore.diff (575 bytes) - added by trbs 6 years ago.
django_memcache_backend_7460.diff (1.4 KB) - added by trbs 6 years ago.
memcachedkeys.diff (1.7 KB) - added by julianb 6 years ago.
Apply percent-encoding to keys
cachetag.diff (1.2 KB) - added by julianb 6 years ago.
django_templatetags_cache-key-quote.diff (992 bytes) - added by trbs 6 years ago.

Download all attachments as: .zip

Change History (25)

Changed 6 years ago by trbs

comment:1 Changed 6 years ago by jacob

  • milestone set to 1.0
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 follow-up: Changed 6 years ago by magneto

given the nature of the unicode DB fields and other crazy non-memcachable key chars, wouldn't 'slugify' be a bit more robust here?

(i guess the only draw back is there may be a slight chance for one slugged key for multiple input keys)

comment:3 in reply to: ↑ 2 Changed 6 years ago by trbs

Replying to magneto:

given the nature of the unicode DB fields and other crazy non-memcachable key chars, wouldn't 'slugify' be a bit more robust here?

(i guess the only draw back is there may be a slight chance for one slugged key for multiple input keys)

slugify would impose to much of a structure for the cache-key imo.
cause the key should be case-sensitive, allow for underscores, alphanumerics and special characters.
for instance, an email address should be a valid cache key.

comment:4 Changed 6 years ago by trbs

  • Patch needs improvement set

in Changeset 5718 some changes to the memcache backend where made to 'Fix some problems with Unicode usage and caching'
but it looks to me that the 'add' method of memcached was 'forgotten' in this commit.

where other methods use smart_str and/or value.encode, 'add' still does the imo wrong thing with key.encode and creates a very big potential problem for unicode users.

i also changed my view on where the patch should be, my first patch was in the wrong place. other cache backends
may very well have no problem at all with spaces inside cache keys. for instance a simple dict-based backend would allow spaces without problems.

so the replace of spaces to underscores should be inside the memcache.py backend, this way there's no difference between the backends.

Changed 6 years ago by trbs

comment:5 Changed 6 years ago by trbs

uploaded new patch which fixes both spaces in keys and cache.add for memcached backend

comment:6 Changed 6 years ago by mtredinnick

This approach has the usual bug with naive escaping of this nature: if I insert a key for foo_bar, the value will be retrieved when foo bar (containing) a space is asked for. And vice-versa. Basically, all the spaced versions are colliding with similar strings containing underscores.

comment:7 Changed 6 years ago by julianb

We could percent-encode all keys. How about that? You could throw everything in there, spaces, unicode, and it gets encoded to something that memcached can handle.

Changed 6 years ago by julianb

Apply percent-encoding to keys

comment:8 Changed 6 years ago by mtredinnick

The more I look at it, the more I dislike this approach. Everybody using memcache would be paying an extra processing penalty for one edge case. We've never allowed spaces in keys for memcache and there's really no reason to do so now. The thing generating the bad key should be responsible for generating a valid key instead. Fix it at the source, rather than penalising everybody for one thing misbehaving (since we control the misbehaving one thing).

comment:9 Changed 6 years ago by julianb

  • Triage Stage changed from Accepted to Design decision needed

I agree with you Malcolm, maybe encoding the keys for everyone is overkill.

I think it's a design decision now if we want to encode the keys for the cache tag only or never ever touch them, like it was the policy before (is this stated in the docs?). In the latter case, this ticket should be wontfix.

comment:10 Changed 6 years ago by trbs

sorry that two slightly separate issue's got rounded up in this one ticket.

the smart_str was placed there to fix issues with unicode, if we remove all encoding then memcached would be as good as useless for unicode sites. and you can hardly say that having a unicode key is 'bad', then again if having a simple space in a key is bad then i guess unicode can be considered bad as well :)

i also agree that we should not put in overhead for a relatively small corner case. but imho i feel that the source is not generating a bad-key, it's just that memcached has this severe limitation.

so maybe we could still 'fix' the spaces case in the cache_tag itself, cause i think it should be proper there to use spaces. and since it already does a lot of processing on what's going to be the key it will not hurt in terms of performance.

then in memcache only fix the 'add' method to use the same smart_str as the other methods so it's consistent.

comment:11 Changed 6 years ago by mtredinnick

  • Triage Stage changed from Design decision needed to Accepted

The add method for memcache should be a separate ticket. This ticket is about the cache tag generating an invalid key.

As for the cache tag, the solution is simple and I really don't understand need for all the discussion. When generating a cache key, make sure it doesn't include spaces, so that it can be inserted into memcache and make sure it doesn't collide with any other unique key. In other words, generate a valid key when constructing the cache key. There's no edge case or need for design decision (the ticket was already accepted. Triagers should not change that) or anything. There's a simple, unambiguous bug that should be fixed.

Unicode, spaces or anything else aren't broken or bad and we aren't saying don't use them. It's a simple fact that the memcache API doesn't take spaces in the keys, so opinions on that are irrelevant to the issue. The fact that we are generating a key with spaces makes this a bug in whatever is generating that key, since the requirement is "no spaces" for memcache. So let's just fix the actual bug and move on: generate a spaceless, unique key when creating the key for the cache tag. Do it in such a way that other, similar tags might be able to use the same code if/when they are written.

Changed 6 years ago by julianb

comment:12 Changed 6 years ago by julianb

This version now has a build_cache_key function handling the key generation for cache tags. It generates a spaceless, unique key. You cannot insert a different string and get the same key.

comment:13 Changed 6 years ago by trbs

i've created a new ticket for the 'add' methods inconsistency here #8410

julian's patch looks good

comment:14 Changed 6 years ago by ubernostrum

  • milestone changed from 1.0 to post-1.0

I'm going to punt this to post-1.0; as Malcolm pointed out, the fact that memcached has rules about keys means that you should be careful choosing your keys, so any behind-the-scenes wizardry to save you from yourself can wait.

comment:15 Changed 6 years ago by jacob

  • milestone changed from post-1.0 to 1.0

I don't think we can assume the template designers understand the subtleties of memcached, so I think making the cache tag Do The Right Thing is something we should do. So un-punting.

I'd like to see a patch without the single-use build_cache_key function -- there's no reason not to inline it. Also, you should use something like urlencode to generate the key; that'll make sure to deal with bad chars other than spaces.

comment:16 Changed 6 years ago by ubernostrum

  • Owner changed from nobody to jacob

Changed 6 years ago by trbs

comment:17 Changed 6 years ago by trbs

added new patch with a minimal change to cachetag so it uses urlquote to encode vary-on's to something every caching backend should be able to handle.

comment:18 Changed 6 years ago by mtredinnick

Five different patches and not a single one has a test to try and demonstrate that the problem is fixed. :-(

comment:19 Changed 6 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [8533]) Fixed #7460 -- Made the "cache" template tag always generate keys that can be
used with the memcache backend (which has the strongest restriction on keys).
Based on a patch from trbs.

comment:20 Changed 3 years ago by jacob

  • milestone 1.0 deleted

Milestone 1.0 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.