Opened 4 years ago

Last modified 4 years ago

#17613 new New feature

Add validation method for cache keys

Reported by: vzima Owned by: nobody
Component: Core (Cache system) Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

We encountered a problem that we can not check keys for cache before we try to use them. This is quite in need in case you are using identifiers that not entirely under your control e.g. URLs or you have more permissive checks than cache requires, e.g. length <= 255.

If you want to use memcache in such cases you need to use quite ugly code to catch exceptions from invalid keys:

from django.core import cache
from somewhere import slugify, expensive_function

try:
    from memcache import Client
    MemcachedKeyError = Client.MemcachedKeyError
except ImportError:
    # If memcached is not present create dummy exception
    class MemcachedKeyError(Exception):
        pass

def cached_expensive_function(url)
    key = slugify(url)
    try:
        result = cache.cache.get(key)
    except MemcachedKeyError:
        return expensive_function(url)

    if result is None:
        result = expensive_function(url)
        cache.cache.set(key, result)
    return result

A nice code for such cases would be:

from django.core import cache
from somewhere import slugify, expensive_function

def cached_expensive_function(url)
    key = slugify(url)

    if not cache.cache.check_key(key)
        return expensive_function(url)

    result = cache.cache.get(key)
    if result is None:
        result = expensive_function(url)
        cache.cache.set(key, result)
    return result

I also append basic patch that adds such method.

Attachments (1)

cache_key_validation.patch (2.6 KB) - added by vzima 4 years ago.

Download all attachments as: .zip

Change History (5)

Changed 4 years ago by vzima

comment:1 Changed 4 years ago by claudep

  • Has patch set
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

This seems very reasonable to me.
The existing test_invalid_keys might be sufficient to test this code.
source:django/trunk/tests/regressiontests/cache/tests.py#L454

comment:2 Changed 4 years ago by vzima

The main decision in my opinion is whether check_key method should be shared by all cache backends (as in patch), or every backend should have its own.

Having backend-specific validations for cache key on one hand allows maximal usage of that particular cache, on other hand it does not force you to use backend-independent keys if possible.

comment:3 follow-up: Changed 4 years ago by russellm

  • Needs documentation set
  • Needs tests set
  • Patch needs improvement set

@claudep -- if you're adding new functionality, then you need to test the new functionality. Existing tests only cover existing functionality. Since you're adding a new API point, you also need to document that API point.

I can see two problems with this proposed idea and patch.

The first is with the implementation -- you raise an exception, which is then caught as a warning. Why not just raise the warning in the first place?

The second is that it misses one of the design considerations of the cache backend. If you look at the comments for the existing tests for cache key validation, you should be able to see the problem.

At the end of the day, caching -- especially memcache -- is an optimisation that is performed for speed. Therefore, the path to set and extract memcache keys needs to be fast, and can't be burdened by excessive computation. Checking for validity is one of the checks that has been *deliberately* omitted from the memcache backend. If you look at the code for the memcache backend, it doesn't invoke the BaseCache set method -- the key/value that is provided is sent directly to the memcache interface.

The warnings that are raised in BaseCache are purely for compatibility between testing and production. All the non-memcache backends will gladly accept cache keys and key sizes that memcache won't accept; rather than get a nasty surprise when you move code into production, Django's cache backends raise warnings.

So -- the proposed patch won't actually achieve the stated goal, because memcache doesn't invoke the base classes check_key anyway.

If the original problem is that cache key components aren't under your control, I put it to you that the right solution is a better key generation function -- something that will truncate arbitrary input (e.g., URLs) to an appropriate length.

My inclination is to close this as wontfix. However, there might be some value in refactoring check_key() so that users can define their own checking mechanism, instead of being forced into memcache-specific checks.

comment:4 in reply to: ↑ 3 Changed 4 years ago by vzima

Replying to russellm:

The basic problem I tried to solve are cases when you do not have cache keys under your control or your key validity does not match requirements of memcache. There are three basic approaches for that - check it up front and avoid cache, hash the identifier, or try it and catch the exception.

  • I headed for the first solution to add a method you can use to check your key before you try to use it, to minimise impact on current implementation. I still think this is valid requirement as you would have to do that in your code anyway. This solution also allows the possibility to write this function specifically for each backend as nobody can assert that memcache will be the only cache library used on production.
  • You can always hash the identifier and make unique cache key, but in most cases this would be much slower than just check for length. Simple truncation could be fast enough, but it does not guarantee that the key will be unique.
  • Define a single exception that would be raised from any backend if problematic key is used would be probably a best solution. Warnings still can happen in django backends. I am not quite sure though, how much memcache libraries differ in their exceptions.

Decision whether to avoid caching or use another key function is dependent on what is cached. Sometimes it might be better to avoid caching if your key can not be used, I just want to add support for such cases.

The first is with the implementation -- you raise an exception, which is then caught as a warning. Why not just raise the warning in the first place?

Yes, I should have done that.

Django's cache backends raise warnings.

You would have to set warnings filter to "error" so they will be actually raised and memcache backend raises memcache exceptions instead of these warnings, so it will not help you on production anyway.

Note 1: You may encounter problems with key length when you do not except them. Assume you check you identifier for length of 247, with no prefix this will create keys of length 250. But once you increase your cache version from 9 to 10, some of these keys might turn length 251 and memcache starts to raise exceptions. As the cache prefixes and versions are defined in settings, you can never be sure the keys will not get too long.

Note 2: Originally, I found this problem by using invalid keys in cache.get method, which docstring says: "... If the key does not exist, return default...". If you use invalid key, then it does not exist and I expect get to either return default value or raise Type exception as dict does.

Note: See TracTickets for help on using tickets.
Back to Top