Opened 6 years ago

Closed 3 years ago

#29867 closed Bug (fixed)

Allow cache.get_or_set() to cache a None result

Reported by: Phill Tornroth Owned by: Nick Pope
Component: Core (Cache system) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Phill Tornroth)

get_or_set docstring says "If the key does not exist, add the key and set it to the default value." -- but that's not quite what it does. It will perform a set if the key doesn't exist, or if the cached value is None.

I think in order to be doing what it says on the tin it'd need to be:

if self.has_key(key, version=version):
        return self.get(key, version=version)
else:
    if callable(default):
        default = default()
        
    self.add(key, default, timeout=timeout, version=version)
    # Fetch the value again to avoid a race condition if another
    # caller added a value between the first get() and the add()
    # above.
    return self.get(key, default, version=version)

I'd find this useful in cases where None was an expensive result to arrive at. If there's spiritual alignment with the suggestion, I'm happy to prepare and submit a change with tests.

Change History (15)

comment:1 by Tim Graham, 6 years ago

Summary: cache.get_or_set won't cache None resultsAllow cache.get_or_set() to cache a None result
Triage Stage: UnreviewedAccepted

I agree with your analysis. I read through previous related issues (#26792, #28601) and didn't see a good reason for the current behavior. I would question if if default is not None: should be there in your proposal (i.e. why shouldn't None by cached?).

comment:2 by Phill Tornroth, 6 years ago

Description: modified (diff)

in reply to:  1 comment:3 by Phill Tornroth, 6 years ago

Replying to Tim Graham:

I agree with your analysis. I read through previous related issues (#26792, #28601) and didn't see a good reason for the current behavior. I would question if if default is not None: should be there in your proposal (i.e. why shouldn't None by cached?).

You're right, I sketched that out too quickly and shouldn't write code in bug trackers (edited) :) I'll put a PR up with code that I have some evidence is working.

comment:4 by Phill Tornroth, 5 years ago

So it turns out the issue I'm identifying is very intentional behavior. In working on a fix I ran into tests that explicitly defend the behavior.

Commit: https://github.com/django/django/commit/4d60261b2a77460b4c127c3d832518b95e11a0ac
Bug the behavior addresses: https://code.djangoproject.com/ticket/28601

I'm confused by the ticket #28601 though. I agree that the inconsistency of a callable vs literal default was concerning, but I think I disagree with the solution to treat None as a unique value. I'd argue that None ought to be a validly cached value and that in the context of the original ticket the behavior ought to be:

cache.get_or_set('foo', None)  # None
cache.get_or_set('foo', 5)  # None, because we just said so
cache.get('foo')  # None :)

cache.get_or_set('bar', lambda: None)  # None
cache.get_or_set('bar', 5)  # None, because we just said so
cache.get('bar')  # None :)

Want to raise this though in case my take here is controversial. Tim, it looks like you stewarded that fix in so maybe this link jogs your memory? Let me know if with this context you disagree with my proposed solution. If there's still consensus I have a PR just about ready.

comment:5 by Phill Tornroth, 5 years ago

Oh apologies, I just noted that you already read through #26801 in considering this ticket so I'll carry on an put a Pull Request up that modifies the tests which expect this behavior.

comment:6 by Phill Tornroth, 5 years ago

Okay, I put up a pull request here: https://github.com/django/django/pull/10558

This would be a breaking change (though I'd imagine dependent code would be rare, and it's much more likely this would improve existing code that isn't aware it's not benefiting from caching in these cases). Anyone who was depending on re-calling a callable default that returns None would need to change their expectation. I'm struggling to imagine what those cases look like, but in those cases I image they'd be best served by not using get_or_set and instead inspecting the results of has_key or get directly and then do something special when None is the cached value.

comment:7 by Phill Tornroth, 5 years ago

So this change is more complicated than I thought/hoped as it turns out. Not all cache backends have a bespoke implementation of has_key and the base cache implementation for has_key simply does return self.get(key, version=version) is not None.

The cache code depends pretty severely on default=None being the absence of a default, instead of a normal value. Right now the MemCache and PyLibMCCache tests are failing as a result (has_key returns False and cached None is treated as a miss).

Additionally I don't love that introducing has_key into the get_or_set logic will introduce an additional cache round trip that wasn't there before. Ideally the get method wouldn't treat None as the lack of a default and we'd have another way to detect a cache miss (raise KeyDoesNotExist() maybe) -- but that's obviously a large change to introduce that effectively incurs a major version change of the django cache API contracts, and it's plausible that a number of the backend libraries will also not provide enough information for their cache backends to know the difference between a missing key and a None/null cached value.

So choices that occur to me are:

1/ Maintain current status quo. Update the get_or_set documentation to be more accurate about the implementation.
2/ The cache system is modified somewhat extensively to support None as a cacheable concept (essentially base class functionality and contracts are changed to assume they can depend on this behavior across cache backends).
3/ Leave the behavior for None up to the individual cache backends, and consider the current proposed get_or_set implementation for LocMemCache.

I find option 1 fairly compelling, frankly. I was using LocMemCache in a production situation but most uses are likely to be in testing where there's broader community benefit for it working similarly to other caches. Confused users may be more likely to discover caching None isn't likely to work across backends if LocMemCache behaves more similarly to existing backends.

So I find myself un-recommending my own proposal and updating the documentation instead (which I'm happy to put a new PR up for if that's consensus).

in reply to:  6 comment:8 by Nasir Hussain, 5 years ago

Replying to Phill Tornroth:
Hi Phill,

Are you still working on this one?

comment:9 by Ahsan Shafiq, 5 years ago

Owner: changed from nobody to Ahsan Shafiq
Status: newassigned

comment:10 by Ahsan Shafiq, 5 years ago

I've created a PR against the issue.
https://github.com/django/django/pull/11812

The issue is fixed now.

Version 0, edited 5 years ago by Ahsan Shafiq (next)

comment:11 by Jacob Walls, 3 years ago

Has patch: set

comment:12 by Mariusz Felisiak, 3 years ago

Patch needs improvement: set

I think we should move forward with Nick's PR which is to handle None in all affected methods. Marking as "needs improvement" because it's still PoC.

comment:13 by Nick Pope, 3 years ago

Owner: changed from Ahsan Shafiq to Nick Pope
Patch needs improvement: unset
Version: 2.1master

I polished off my PR.

This fixes all operations that make use of .get() without passing a default argument.

comment:14 by Mariusz Felisiak, 3 years ago

Triage Stage: AcceptedReady for checkin

comment:15 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In bb64b99b:

Fixed #29867 -- Added support for storing None value in caches.

Many of the cache operations make use of the default argument to the
.get() operation to determine whether the key was found in the cache.
The default value of the default argument is None, so this results in
these operations assuming that None is not stored in the cache when it
actually is. Adding a sentinel object solves this issue.

Unfortunately the unmaintained python-memcached library does not support
a default argument to .get(), so the previous behavior is preserved for
the deprecated MemcachedCache backend.

Note: See TracTickets for help on using tickets.
Back to Top