Opened 14 years ago
Last modified 4 years ago
#15894 new Bug
SITE_CACHE does not invalidate in multiprocess environments
Reported by: | German M. Bravo | Owned by: | nobody |
---|---|---|---|
Component: | contrib.sites | Version: | 1.3 |
Severity: | Normal | Keywords: | cache invalidation |
Cc: | Flavio Curella, German M. Bravo, krzysiumed@…, niel | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In a configuration where multiple python processes are created to serve requests, a global variable contrib.sites.models.SITE_CACHE is created for each process. In the scenario where each and all processes have used (and cached) any given Site object; if a client requests the update of the object (either by saving or deleting the object), only the SITE_CACHE in the process which served the client request gets invalidated. All other processes maintain the cached version of the object in their own SITE_CACHE dictionaries and further calls to get_current() by client requests in any of those processes return outdated objects.
Instead of using a global dictionary, I suggest using the django caching system.
Attachments (3)
Change History (25)
comment:1 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 13 years ago
Needs tests: | set |
---|---|
UI/UX: | unset |
comment:3 by , 13 years ago
Cc: | added |
---|
comment:4 by , 13 years ago
Has patch: | set |
---|---|
Needs tests: | unset |
by , 13 years ago
Attachment: | sites.diff added |
---|
comment:5 by , 13 years ago
Added some doc for the patch.
I still suck at reST, so it's probably broken, but at least it's a start.
comment:6 by , 13 years ago
Owner: | changed from | to
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:7 by , 13 years ago
Needs documentation: | set |
---|---|
Patch needs improvement: | set |
Triage Stage: | Ready for checkin → Accepted |
The docs are a start, but I think we probably need a bit more than that, as well as an entry in the changelog. Using sleep()
in the tests is probably not the best way to do it... no matter how fast everything else runs, that test will take at least as long as your sleep statement. Can you use a yield
instead to get the same effect?
comment:8 by , 13 years ago
Cc: | added |
---|
comment:9 by , 13 years ago
Patch needs improvement: | unset |
---|
I'd tried to improve fcurella's patch.
Changes:
- No synchronization made by
time.sleep
. Now it usesmultiprocessing.Value
instance calledphase
. It's ugly but it works. - The first process (now it's called
check_if_domain_changed
) send result of comparing domains. Test functiontest_multiprocess
receive this value. fcurella used a queue for this, now it uses amultiprocessing.Value
instance. It's simpler - we pass only one value so we don't need a queue. On the other hand,domain_changed
argument passed tocheck_if_domain_changed
function is an output argument and it's surprising. - I'd deleted except clause (see starting and joining processes - lines 81-84). I don't know why
Process.start
orProcess.join
may raise an exception but I think we shouldn't hide exceptions. - I'd changed some attributes' names.
I do not think my patch is better or I'm better coder than fcurella. I'm just trying to help.
by , 13 years ago
Attachment: | 15894_v2.diff added |
---|
comment:10 by , 12 years ago
Cc: | added |
---|---|
Owner: | changed from | to
comment:11 by , 12 years ago
Sorry I disappeared, but for some reason I didn't get any notification on this ticket.
@krzysiumed: thanks for the better test.
I just have a question: checking for the site object from any other cache backend that is not LocMemCache is going to be slower than accessing a global variable.
Should we require some setting or other mechanism to enable the new behavior?
comment:12 by , 12 years ago
Cc: | added |
---|
comment:13 by , 12 years ago
The test by @krzysiumed (https://code.djangoproject.com/ticket/15894#comment:9) fails for me when using sqlite3, but passes with postgresql_psycopg2.
While using sqlite, I can verify that current_site = self.get(pk=sid)
in SiteManager
returns the unchanged site when called from the first process (check_if_domain_changed
). Thus, the Site.save()
method properly deletes the cache entry, but the query to re-populate the cache returns an old object.
comment:14 by , 12 years ago
#19334 was a duplicate.
Repeating the comment I made on that ticket: The real problem here is that there is global state -- and while I'll agree that this is a problem, I don't think it's a problem we fix by moving to a different store to hold that global state.
The cache is being used to prevent a single read from a small, indexed table on the database; has anyone done any performance tests to check whether a call to memcache is actually faster? I wouldn't be surprised to find out that the caching layer is actually slower than the query we're trying to optimize.
If global state really is a problem, is just removing the caching layer an option?
comment:15 by , 12 years ago
I've run some quick'n'dirty benchmarks using the following test:
from django.test import TestCase import time from django.contrib.sites.models import Site class SimpleTest(TestCase): def test_sites_permormance(self): for i in range(3): t = time.time() for j in range(99999): Site.objects.get_current() delta_t = time.time() - t print delta_t
with the following results:
Global variable --------------- 0.444395065308 0.44925403595 0.514168024063 LocMemCache ------------- 5.65192317963 5.80550599098 5.73623394966 MemcachedCache (via TCP/IP) --------------------------- 15.932502985 15.8288300037 15.7813811302 No Cache at all --------------- 84.9961898327 86.1811769009 85.7295229435
As I was expecting, using LocMemCache is slower because of the function calls overheads, and Memcached is slower because of latency.
I don't think removing the caching layer here is an option. It might work in projects that don't really use the sites
app, but for more extensive uses (e.g.: CMSes filtering content by site), the performance would degrade way too much.
If I understanding correctly, so far the options are:
a) Give users the option to move this global state to a 'shared storage' of their choice (this will inevitably introduce some new setting), or
b) Implement some kind of 'inter-process' communication between different django processes.
Note that b) is possibly out of the scope for the Django core project, and it might be more appropriate to implement it as a third-party app project.
comment:16 by , 12 years ago
Ok - those performance numbers are helpful. and show that the caches do speed things up.
However, I disagree with your analysis. Yes, a non-cached version of the call is 170 times slower than the global variable, 18 times slower than locmem and 6 times slower than memcache. But in absolute terms, a call to get the current site takes 0.00085s - that's less than a millisecond per usage.
While it's certainly worth considering faster alternatives when they're available, it's also worth considering the absolute values.
comment:18 by , 9 years ago
Replying to claudep:
The proposed patch is on the right track, most of the people here agree that the cache framework should be used for this.
However the global variable should not remain, clearing the cached values for all the sites would not be possible anymore. Clearing the ones that hit the current process would only add more inconsistencies. Also if the current site is not retrieved with Site.object.get_current() before the save operation, the pre/post hooks will not clear the cached data for that site, leaving other processes to use obsolete data.
Should the proposed and closed patch be reviewed/updated or can a new patch be submitted?
comment:19 by , 9 years ago
I raised a proposal on django-developers to replace the Site
model with a setting. I think it would be more useful to pursue that option rather than fixing this ticket.
comment:21 by , 6 years ago
(Replying because we got bitten by this issue recently.)
I've read the proposal at https://groups.google.com/d/topic/django-developers/OGOf1TfSBBs/discussion, and though it seems like a fine solution, there doesn't seem to be much movement currently.
I would argue that, in absence of a valid alternative, a patch would be welcome. I don't think this would be an intrusive change: It could make the multiprocess envs work without making things more difficult for people who are happy with the current setup.
Would there be other reasons for not accepting patches that i'm not aware of?
We'd be happy to invest some time into getting a patch merged in Django. Currently, we're patching it ourselves.
Regards,
Kees
comment:22 by , 4 years ago
Adding some extra context for this - we're also being hit by this in the current context of Docker containers. For High-Availability reasons, we're running multiple replicas. A change only propagates to a single one, and the other replicas need restarting.
I'm considering monkey-patching the global to replace it with a dict-like object that actually calls out to the caching framework instead. Maybe the cache global/singleton should be pointed to by new setting? Something like SITES_CACHE = "django.contrib.sites.models.SITE_CACHE"
.
The discussion on the mailing list referred to by Tim seems to have died without any outcome.
've added a patch, but I have no idea on how to write tests that could simulate two processes. Suggestions are welcome!