Code

Opened 3 years ago

Last modified 17 months ago

#15894 new Bug

SITE_CACHE does not invalidate in multiprocess environments

Reported by: Kronuz Owned by: nobody
Component: contrib.sites Version: 1.3
Severity: Normal Keywords: cache invalidation
Cc: fcurella, Kronuz, krzysiumed@…, niel Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In a configuration where multiple python processes are created to serve requests, a global variable contrib.sites.models.SITE_CACHE is created for each process. In the scenario where each and all processes have used (and cached) any given Site object; if a client requests the update of the object (either by saving or deleting the object), only the SITE_CACHE in the process which served the client request gets invalidated. All other processes maintain the cached version of the object in their own SITE_CACHE dictionaries and further calls to get_current() by client requests in any of those processes return outdated objects.

Instead of using a global dictionary, I suggest using the django caching system.

Attachments (3)

sites-cache.diff (3.5 KB) - added by fcurella 3 years ago.
better patch, with tests
sites.diff (4.0 KB) - added by fcurella 3 years ago.
15894_v2.diff (4.5 KB) - added by krzysiumed 2 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 3 years ago by jacob

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 3 years ago by fcurella

  • Needs tests set
  • UI/UX unset

've added a patch, but I have no idea on how to write tests that could simulate two processes. Suggestions are welcome!

comment:3 Changed 3 years ago by fcurella

  • Cc fcurella added

Changed 3 years ago by fcurella

better patch, with tests

comment:4 Changed 3 years ago by fcurella

  • Has patch set
  • Needs tests unset

Changed 3 years ago by fcurella

comment:5 Changed 3 years ago by fcurella

Added some doc for the patch.

I still suck at reST, so it's probably broken, but at least it's a start.

comment:6 Changed 3 years ago by fcurella

  • Owner changed from nobody to fcurella
  • Triage Stage changed from Accepted to Ready for checkin

comment:7 Changed 3 years ago by PaulM

  • Needs documentation set
  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

The docs are a start, but I think we probably need a bit more than that, as well as an entry in the changelog. Using sleep() in the tests is probably not the best way to do it... no matter how fast everything else runs, that test will take at least as long as your sleep statement. Can you use a yield instead to get the same effect?

comment:8 Changed 2 years ago by Kronuz

  • Cc Kronuz added

comment:9 Changed 2 years ago by krzysiumed

  • Patch needs improvement unset

I'd tried to improve fcurella's patch.

Changes:

  • No synchronization made by time.sleep. Now it uses multiprocessing.Value instance called phase. It's ugly but it works.
  • The first process (now it's called check_if_domain_changed) send result of comparing domains. Test function test_multiprocess receive this value. fcurella used a queue for this, now it uses a multiprocessing.Value instance. It's simpler - we pass only one value so we don't need a queue. On the other hand, domain_changed argument passed to check_if_domain_changed function is an output argument and it's surprising.
  • I'd deleted except clause (see starting and joining processes - lines 81-84). I don't know why Process.start or Process.join may raise an exception but I think we shouldn't hide exceptions.
  • I'd changed some attributes' names.

I do not think my patch is better or I'm better coder than fcurella. I'm just trying to help.

@edit:
I'd forgotten to write that the test fails on my computer because the domain is not changed.

Last edited 2 years ago by krzysiumed (previous) (diff)

Changed 2 years ago by krzysiumed

comment:10 Changed 22 months ago by krzysiumed

  • Cc krzysiumed@… added
  • Owner changed from fcurella to nobody

comment:11 Changed 21 months ago by fcurella

Sorry I disappeared, but for some reason I didn't get any notification on this ticket.

@krzysiumed: thanks for the better test.

I just have a question: checking for the site object from any other cache backend that is not LocMemCache is going to be slower than accessing a global variable.
Should we require some setting or other mechanism to enable the new behavior?

comment:12 Changed 19 months ago by niel

  • Cc niel added

comment:13 Changed 19 months ago by anonymous

The test by @krzysiumed (https://code.djangoproject.com/ticket/15894#comment:9) fails for me when using sqlite3, but passes with postgresql_psycopg2.

While using sqlite, I can verify that current_site = self.get(pk=sid) in SiteManager returns the unchanged site when called from the first process (check_if_domain_changed). Thus, the Site.save() method properly deletes the cache entry, but the query to re-populate the cache returns an old object.

comment:14 Changed 17 months ago by russellm

#19334 was a duplicate.

Repeating the comment I made on that ticket: The real problem here is that there is global state -- and while I'll agree that this is a problem, I don't think it's a problem we fix by moving to a different store to hold that global state.

The cache is being used to prevent a single read from a small, indexed table on the database; has anyone done any performance tests to check whether a call to memcache is actually faster? I wouldn't be surprised to find out that the caching layer is actually slower than the query we're trying to optimize.

If global state really is a problem, is just removing the caching layer an option?

comment:15 Changed 17 months ago by fcurella

I've run some quick'n'dirty benchmarks using the following test:

from django.test import TestCase
import time
from django.contrib.sites.models import Site


class SimpleTest(TestCase):
    def test_sites_permormance(self):
        for i in range(3):
            t = time.time()
            for j in range(99999):
                Site.objects.get_current()
            delta_t = time.time() - t
            print delta_t

with the following results:

Global variable
---------------
0.444395065308
0.44925403595
0.514168024063

LocMemCache
-------------
5.65192317963
5.80550599098
5.73623394966

MemcachedCache (via TCP/IP)
---------------------------
15.932502985
15.8288300037
15.7813811302

No Cache at all
---------------
84.9961898327
86.1811769009
85.7295229435

As I was expecting, using LocMemCache is slower because of the function calls overheads, and Memcached is slower because of latency.

I don't think removing the caching layer here is an option. It might work in projects that don't really use the sites app, but for more extensive uses (e.g.: CMSes filtering content by site), the performance would degrade way too much.

If I understanding correctly, so far the options are:

a) Give users the option to move this global state to a 'shared storage' of their choice (this will inevitably introduce some new setting), or
b) Implement some kind of 'inter-process' communication between different django processes.

Note that b) is possibly out of the scope for the Django core project, and it might be more appropriate to implement it as a third-party app project.

comment:16 Changed 17 months ago by russellm

Ok - those performance numbers are helpful. and show that the caches do speed things up.

However, I disagree with your analysis. Yes, a non-cached version of the call is 170 times slower than the global variable, 18 times slower than locmem and 6 times slower than memcache. But in absolute terms, a call to get the current site takes 0.00085s - that's less than a millisecond per usage.

While it's certainly worth considering faster alternatives when they're available, it's also worth considering the absolute values.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from nobody to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.