Opened 10 months ago

Closed 10 months ago

Last modified 7 months ago

#35166 closed New feature (wontfix)

Return python-memcached support

Reported by: Matej Spiller Muys Owned by: nobody
Component: Core (Cache system) Version: 5.0
Severity: Normal Keywords: memcached performance get_many
Cc: Nick Pope Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Matej Spiller Muys)

In the past python-memcached was removed from Django (4.1).
https://docs.djangoproject.com/en/5.0/releases/4.1/

However https://github.com/linsomniac/python-memcached/releases was resurected with 3 released in past month solving both hacks needed in Django:
https://github.com/linsomniac/python-memcached/issues/159
https://github.com/linsomniac/python-memcached/issues/170

Commit that deleted python-memcached:
https://github.com/django/django/commit/05f3a6186efefc9fca2204a745b992501c6fd91f

Why is python-memcached interesting today is because of HUUUGE performance degradation:
https://github.com/pinterest/pymemcache/issues/478

Basically pymemcache get_many is very slow because it send the request to first sever wait for response, then to second server waiting for response.
But python-memcached first sends the requests to all servers and then waits for responses making it number of memcached servers times faster.

Change History (8)

comment:1 by Matej Spiller Muys, 10 months ago

Description: modified (diff)

comment:2 by Matej Spiller Muys, 10 months ago

We are using it as follows with latest (it should be the minimum to remove all the hacks):
python-memcached==1.62

class MemcachedCache(django_memcached.BaseMemcachedCache):
    """An implementation of a cache binding using python-memcached"""

    def __init__(self, server, params):
        # python-memcached ≥ 1.45 returns None for a nonexistent key in
        # incr/decr(), python-memcached < 1.45 raises ValueError.
        import memcache

        super().__init__(server, params, library=memcache, value_not_found_exception=ValueError)
        self._options = {'pickleProtocol': pickle.HIGHEST_PROTOCOL, **self._options}

comment:3 by Mariusz Felisiak, 10 months ago

Cc: Nick Pope added
Component: UncategorizedCore (Cache system)
Resolution: wontfix
Status: newclosed

Thanks for this ticket, however I don't think we would like to re-add python-memcached support.

Basically pymemcache get_many is very slow because it send the request to first sever wait for response, then to second server waiting for response.
But python-memcached first sends the requests to all servers and then waits for responses making it number of memcached servers times faster.

This should be fixed in pymemcache, we cannot add/remove memcached libraries every 2 releases.

You can follow the triaging guidelines with regards to wontfix tickets and take this to DevelopersMailingList, if you don't agree.

comment:4 by Matej Spiller Muys, 10 months ago

The architecture of pymemcache is not allowing such flow without major refactoring since they splited multi server logic into separated logic making it hard to fix it.
That's why the issue was not solved and does not look to be solved in the near future unfourtanetly. I tried refactoring of pymemcache but it is really hard and would take considerable amount of development.

I understand you cannot add/remove memcached libraries every 2 releases but still current implementation of pymemcache is multiple times slower when using multiple servers. It's was a huge regression in Django memcache (we use 4 server so 4x times slower get_many).

The interface of python-memcached interface is now aligned with pymemcache so no longer need for hacks.

Last edited 10 months ago by Matej Spiller Muys (previous) (diff)

comment:5 by Nick Pope, 10 months ago

FYI: I originally added support for pymemcache and suggested the deprecation and removal of python-memcached as it was unmaintained for years.

I confess I was curious recently when I saw that python-memcached had a new release (v1.60) that solved many of the problems we originally had. Unfortunately, it also broke another workaround that Django required which made it inconsistent with other client libraries. I managed to get that fixed earlier this year and it was released as v1.62.

I even knocked up a commit to see what it would look like to reintroduce this backend: https://github.com/ngnpope/django/commit/python-memcached

It's actually very, very little. The question is - is it worth reintroducing given the history of poor maintenance and inconsistency? We wouldn't want to add this again if we're going to have to rip it out again, as Mariusz says... Anyway - I'm on the fence with putting this back in - let's say +0.1!

If you really need to use python-memcached>=1.62 you can write your own backend which is as simple as this:

import pickle

from django.core.cache.backends.memcached import BaseMemcachedCache


class MemcachedCache(BaseMemcachedCache):
    """An implementation of a cache binding using python-memcached."""

    def __init__(self, server, params):
        import memcache
        super().__init__(
            server, params, library=memcache, value_not_found_exception=ValueError
        )
        self._options = {"pickleProtocol": pickle.HIGHEST_PROTOCOL} | self._options

comment:6 by Nick Pope, 10 months ago

Why is python-memcached interesting today is because of HUUUGE performance degradation:
https://github.com/pinterest/pymemcache/issues/478

I forgot point out that this is not making a persuasive argument when the linked issue that you have written contains no performance comparison.

comment:7 by Matej Spiller Muys, 10 months ago

We already have our own backend since Django 4.1 because of performance issues.

I forgot point out that this is not making a persuasive argument when the linked issue that you have written contains no performance comparison.

It is really simple to reproduce/mesaure. We have 4 memcached servers on AWS each having around 1 ms roundtrip.
Now if get_many needs to fetch from all 4, it takes 4 ms for pymemcache (doing 4x loop of send & receive) and 1 ms for python-memcached (4x send then 4x receive ...). I didn't write the measurements at that time because it was clear to the author of pymemcache what the problem is and that is would be solved if refactored the same way as python-memcached logic. Basically you shouldn't wait for response after each send before sending to another server (but hash.py in pymemcache just loops around base.py code making it harder to optimise).

The author of pymemcache is using mcrouter that works the same as python-memcached (first 4x send and then 4x receive). We don't want to use mcrouter, it would help but still adds a bit of extra hop latency or as alternative deploy it as side car with extra deploy complexity.

comment:8 by Jochen Garcke, 7 months ago

Besides the performance aspect, there is also the silenting of errors that python-memcached does, while the alternatives do not.

Also somewhat related to this issue #33092, which also talks about threading issues and errors due to that.

So at this time there is no drop-in replacement for the former default cache-backend that just works for everyone.

In any case, thanks for the short code for the own backend, now the cache errors are silent again.

Note: See TracTickets for help on using tickets.
Back to Top