Opened 14 years ago

Last modified 4 years ago

#15894 new Bug

SITE_CACHE does not invalidate in multiprocess environments

Reported by: German M. Bravo Owned by: nobody
Component: contrib.sites Version: 1.3
Severity: Normal Keywords: cache invalidation
Cc: Flavio Curella, German M. Bravo, krzysiumed@…, niel Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In a configuration where multiple python processes are created to serve requests, a global variable contrib.sites.models.SITE_CACHE is created for each process. In the scenario where each and all processes have used (and cached) any given Site object; if a client requests the update of the object (either by saving or deleting the object), only the SITE_CACHE in the process which served the client request gets invalidated. All other processes maintain the cached version of the object in their own SITE_CACHE dictionaries and further calls to get_current() by client requests in any of those processes return outdated objects.

Instead of using a global dictionary, I suggest using the django caching system.

Attachments (3)

sites-cache.diff (3.5 KB ) - added by Flavio Curella 13 years ago.
better patch, with tests
sites.diff (4.0 KB ) - added by Flavio Curella 13 years ago.
15894_v2.diff (4.5 KB ) - added by Christopher Medrela 13 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 by Jacob, 14 years ago

Triage Stage: UnreviewedAccepted

comment:2 by Flavio Curella, 13 years ago

Needs tests: set
UI/UX: unset

've added a patch, but I have no idea on how to write tests that could simulate two processes. Suggestions are welcome!

comment:3 by Flavio Curella, 13 years ago

Cc: Flavio Curella added

by Flavio Curella, 13 years ago

Attachment: sites-cache.diff added

better patch, with tests

comment:4 by Flavio Curella, 13 years ago

Has patch: set
Needs tests: unset

by Flavio Curella, 13 years ago

Attachment: sites.diff added

comment:5 by Flavio Curella, 13 years ago

Added some doc for the patch.

I still suck at reST, so it's probably broken, but at least it's a start.

comment:6 by Flavio Curella, 13 years ago

Owner: changed from nobody to Flavio Curella
Triage Stage: AcceptedReady for checkin

comment:7 by Paul McMillan, 13 years ago

Needs documentation: set
Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

The docs are a start, but I think we probably need a bit more than that, as well as an entry in the changelog. Using sleep() in the tests is probably not the best way to do it... no matter how fast everything else runs, that test will take at least as long as your sleep statement. Can you use a yield instead to get the same effect?

comment:8 by German M. Bravo, 13 years ago

Cc: German M. Bravo added

comment:9 by Christopher Medrela, 13 years ago

Patch needs improvement: unset

I'd tried to improve fcurella's patch.

Changes:

  • No synchronization made by time.sleep. Now it uses multiprocessing.Value instance called phase. It's ugly but it works.
  • The first process (now it's called check_if_domain_changed) send result of comparing domains. Test function test_multiprocess receive this value. fcurella used a queue for this, now it uses a multiprocessing.Value instance. It's simpler - we pass only one value so we don't need a queue. On the other hand, domain_changed argument passed to check_if_domain_changed function is an output argument and it's surprising.
  • I'd deleted except clause (see starting and joining processes - lines 81-84). I don't know why Process.start or Process.join may raise an exception but I think we shouldn't hide exceptions.
  • I'd changed some attributes' names.

I do not think my patch is better or I'm better coder than fcurella. I'm just trying to help.

Version 0, edited 13 years ago by Christopher Medrela (next)

by Christopher Medrela, 13 years ago

Attachment: 15894_v2.diff added

comment:10 by Christopher Medrela, 12 years ago

Cc: krzysiumed@… added
Owner: changed from Flavio Curella to nobody

comment:11 by Flavio Curella, 12 years ago

Sorry I disappeared, but for some reason I didn't get any notification on this ticket.

@krzysiumed: thanks for the better test.

I just have a question: checking for the site object from any other cache backend that is not LocMemCache is going to be slower than accessing a global variable.
Should we require some setting or other mechanism to enable the new behavior?

comment:12 by niel, 12 years ago

Cc: niel added

comment:13 by anonymous, 12 years ago

The test by @krzysiumed (https://code.djangoproject.com/ticket/15894#comment:9) fails for me when using sqlite3, but passes with postgresql_psycopg2.

While using sqlite, I can verify that current_site = self.get(pk=sid) in SiteManager returns the unchanged site when called from the first process (check_if_domain_changed). Thus, the Site.save() method properly deletes the cache entry, but the query to re-populate the cache returns an old object.

comment:14 by Russell Keith-Magee, 12 years ago

#19334 was a duplicate.

Repeating the comment I made on that ticket: The real problem here is that there is global state -- and while I'll agree that this is a problem, I don't think it's a problem we fix by moving to a different store to hold that global state.

The cache is being used to prevent a single read from a small, indexed table on the database; has anyone done any performance tests to check whether a call to memcache is actually faster? I wouldn't be surprised to find out that the caching layer is actually slower than the query we're trying to optimize.

If global state really is a problem, is just removing the caching layer an option?

comment:15 by Flavio Curella, 12 years ago

I've run some quick'n'dirty benchmarks using the following test:

from django.test import TestCase
import time
from django.contrib.sites.models import Site


class SimpleTest(TestCase):
    def test_sites_permormance(self):
        for i in range(3):
            t = time.time()
            for j in range(99999):
                Site.objects.get_current()
            delta_t = time.time() - t
            print delta_t

with the following results:

Global variable
---------------
0.444395065308
0.44925403595
0.514168024063

LocMemCache
-------------
5.65192317963
5.80550599098
5.73623394966

MemcachedCache (via TCP/IP)
---------------------------
15.932502985
15.8288300037
15.7813811302

No Cache at all
---------------
84.9961898327
86.1811769009
85.7295229435

As I was expecting, using LocMemCache is slower because of the function calls overheads, and Memcached is slower because of latency.

I don't think removing the caching layer here is an option. It might work in projects that don't really use the sites app, but for more extensive uses (e.g.: CMSes filtering content by site), the performance would degrade way too much.

If I understanding correctly, so far the options are:

a) Give users the option to move this global state to a 'shared storage' of their choice (this will inevitably introduce some new setting), or
b) Implement some kind of 'inter-process' communication between different django processes.

Note that b) is possibly out of the scope for the Django core project, and it might be more appropriate to implement it as a third-party app project.

comment:16 by Russell Keith-Magee, 12 years ago

Ok - those performance numbers are helpful. and show that the caches do speed things up.

However, I disagree with your analysis. Yes, a non-cached version of the call is 170 times slower than the global variable, 18 times slower than locmem and 6 times slower than memcache. But in absolute terms, a call to get the current site takes 0.00085s - that's less than a millisecond per usage.

While it's certainly worth considering faster alternatives when they're available, it's also worth considering the absolute values.

comment:17 by Claude Paroz, 9 years ago

#25783 was a duplicate with a patch.

in reply to:  17 comment:18 by Virtosu Bogdan, 9 years ago

Replying to claudep:

#25783 was a duplicate with a patch.

The proposed patch is on the right track, most of the people here agree that the cache framework should be used for this.

However the global variable should not remain, clearing the cached values for all the sites would not be possible anymore. Clearing the ones that hit the current process would only add more inconsistencies. Also if the current site is not retrieved with Site.object.get_current() before the save operation, the pre/post hooks will not clear the cached data for that site, leaving other processes to use obsolete data.

Should the proposed and closed patch be reviewed/updated or can a new patch be submitted?

comment:19 by Tim Graham, 9 years ago

I raised a proposal on django-developers to replace the Site model with a setting. I think it would be more useful to pursue that option rather than fixing this ticket.

comment:20 by Tim Graham, 7 years ago

#28844 was another duplicate.

comment:21 by Kees Hink, 6 years ago

(Replying because we got bitten by this issue recently.)

I've read the proposal at https://groups.google.com/d/topic/django-developers/OGOf1TfSBBs/discussion, and though it seems like a fine solution, there doesn't seem to be much movement currently.

I would argue that, in absence of a valid alternative, a patch would be welcome. I don't think this would be an intrusive change: It could make the multiprocess envs work without making things more difficult for people who are happy with the current setup.

Would there be other reasons for not accepting patches that i'm not aware of?

We'd be happy to invest some time into getting a patch merged in Django. Currently, we're patching it ourselves.

Regards,

Kees

comment:22 by Sergei Maertens, 4 years ago

Adding some extra context for this - we're also being hit by this in the current context of Docker containers. For High-Availability reasons, we're running multiple replicas. A change only propagates to a single one, and the other replicas need restarting.

I'm considering monkey-patching the global to replace it with a dict-like object that actually calls out to the caching framework instead. Maybe the cache global/singleton should be pointed to by new setting? Something like SITES_CACHE = "django.contrib.sites.models.SITE_CACHE".

The discussion on the mailing list referred to by Tim seems to have died without any outcome.

Note: See TracTickets for help on using tickets.
Back to Top