Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#21351 closed Cleanup/optimization (fixed)

memoize function needs tests and improvements

Reported by: Daniele Procida Owned by: Bouke Haarsma
Component: Utilities Version: master
Severity: Normal Keywords:
Cc: bmispelon@…, Bouke Haarsma Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


django.utils.functional.memoize(func, cache, num_args):

  • lacks tests
  • should raise exceptions when provided with illegal arguments (func must be a function, cache a dict, and num_args a positive integer)
  • should raise an exception if any of the arguments of func, which are used as dictionary keys, is unhashable

Attachments (2)

memoize_problem_21351.diff (1.7 KB) - added by Bouke Haarsma 5 years ago.
memoize num_args implication for get_callable (1.5 KB) - added by Bouke Haarsma 5 years ago.

Download all attachments as: .zip

Change History (21)

comment:1 Changed 5 years ago by Tim Graham

What's the rationale for raising exceptions with illegal arguments? Maybe it's the right thing to do, but I'm skeptical. Here's my research on the topic.

comment:2 Changed 5 years ago by Daniele Procida

I think you're right, it looks over-zealous. Do you think the third point's unnecessary as well?

comment:3 Changed 5 years ago by Daniele Procida

There's actually a worse problem with memoize, that Shai Berger has pointed out: it chokes on keyword arguments...

His example:

>>> from django.utils import functional as F
>>> def f(a,b): return a+b
>>> cache = {}
>>> f = F.memoize(f, cache, 2)
>>> f(3,4)
>>> f(3,b=4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: wrapper() got an unexpected keyword argument 'b'

comment:4 Changed 5 years ago by Baptiste Mispelon

Triage Stage: UnreviewedAccepted

Accepting this on the basis of missing tests alone (I changed memoize to be a no-op and the full test suite runs without problems).

As for the other issues described, maybe we could build memoize on top of functools.lru_cache [1].
It's only available on python 3, but there is a backport available [2].


comment:5 Changed 5 years ago by Baptiste Mispelon

Cc: bmispelon@… added

comment:6 Changed 5 years ago by Shai Berger

I don't know all the places that memoize is used, but I suspect making it more robust will also make it slower, and for a function whose entire reason for existence is performance, that would be a problem.

Currently, memoize() is undocumented, and I think we should consider just making it explicitly private. Users don't really need it in this form (and when they do, they should probably base it on the cache framework).

comment:7 Changed 5 years ago by Bouke Haarsma

Cc: Bouke Haarsma added

Replacing memoize by lru_cache will probably take some work to get right, as memoize allows one to provide a dict for caching. lru_cache hides the cache from the user, only allowing the instrumented cache_clear() for clearing the cache altogether.

Update: after my patch I found that although the cache is now hidden, this 'feature' was not used by Django except for clearing the caches.

Last edited 5 years ago by Bouke Haarsma (previous) (diff)

comment:8 Changed 5 years ago by Bouke Haarsma

As for the usage of memoize:

$ git grep -c "memoize"

comment:9 Changed 5 years ago by Bouke Haarsma

Owner: changed from nobody to Bouke Haarsma
Status: newassigned

I've submitted a PR [1]

However, I have some questions about the deprecation and backport:

  • Can the backport be included, as it is MIT licensed?
  • memoize is an internal API, should it follow the deprecation policy?
  • Is this the correct way to include a backport?

Update: using GitHub search [2], there's quite a few from django.utils.functional import memoize. So we might need to properly deprecate the decorator.


Last edited 5 years ago by Bouke Haarsma (previous) (diff)

comment:10 Changed 5 years ago by Bouke Haarsma

As discussed on IRC, a few benchmarks comparing memoize with lru_cache and lru_cache's backport. Also, another column with the statistics bookkeeping in the backport disabled for extra speed. These results are achieved with lru_cache(maxsize=None), as that doesn't involve locking and is comparable to memoize.

|       | memoize | lru_cache (py3) | backport | no-stats |
| write | 0.970   | 2.373           | 2.714    | 2.413    |
| read  | 0.342   | 0.830           | 1.053    | 0.950    |
Last edited 5 years ago by Bouke Haarsma (previous) (diff)

comment:11 Changed 5 years ago by Aymeric Augustin

The pull request looks good, I left a comment, you can mark the ticket as RFC once you've addressed it.

comment:12 Changed 5 years ago by Bouke Haarsma

Has patch: set
Triage Stage: AcceptedReady for checkin

For ticket completeness bmispelon also left a comment about memoize(..., num_args=1) whilst get_callable takes 2 arguments:

bmispelon 16 hours ago:

The num_args argument was introduced in b6812f2d9d1bea3bdc9c5f9a747c5c71c764fa0f specifically for get_callable (note how it can take two arguments but the num_args is 1).

I suspect there might be something weird going on but I haven't had time to look into it and unfortunately, we cannot ask the original committer of that change anymore :(

Say a function has two parameters and one result, then A,B -> X and A,C -> Y. However as the function's cache strategy only accounts for the first argument, it will always do the same, namely A,? -> X UNLESS in the case of an exception. Now when the exception happened for A,D -> ?, then no cache is written and A,C -> Y. However if A,C -> Y is executed FIRST and cached, then A,? -> Y and thus A,D -> Y -- no exception being raised.

I have added a test to the PR demonstrate this problem.

comment:13 Changed 5 years ago by Baptiste Mispelon

Triage Stage: Ready for checkinAccepted

There's still some issues with the pull request:

  • shai raised concerns about performance. Your benchmark has no units but I assume it's a "smaller is better" type which means that the lru_cache is less performant than the old memoize. Can you address that (and maybe also show the code of your benchmark)?
  • The current pull request is not python3 compatible (because of the use of xrange in one of the tests)
  • I don't think the test you added to show the issue of num_args belongs in this pull request. While I agree that there's probably a bug there, it should either be opened as a separate ticket or just documented since we're deprecating memoize altogether.


Changed 5 years ago by Bouke Haarsma

Attachment: memoize_problem_21351.diff added

memoize num_args implication for get_callable

Changed 5 years ago by Bouke Haarsma

Attachment: added

comment:14 Changed 5 years ago by Bouke Haarsma

I have attached a rewritten speedtest, which should show the performance impact on 1M reads and writes. lru_cache is around 3 times slower on both, which should come as no surprise. memoize has a very simplistic cache key generator as it copies positional arguments into a new list, while lru_cache creates a hash of both positional and keyword arguments. This overhead comes at a reduced performance. Nevertheless looking at the performance difference of a single read (0.0000006s) and write (.0000014s), the impact seems rather limited.

Python: 2.7.5 (default, Sep  5 2013, 23:32:22) [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
|           | memoize  | lru_cache |
| 1M reads  | 0.34065s | 1.07396s  |
| 1M writes | 1.01192s | 2.71383s  |

Python: 3.3.2 (default, Nov  6 2013, 08:59:42) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)]
|           | memoize  | lru_cache |
| 1M reads  | 0.30876s | 0.87845s  |
| 1M writes | 0.99767s | 2.39997s  |

The PR has been updated with the redundant tests removed and PY3 compatibility.

comment:15 Changed 5 years ago by Anssi Kääriäinen

This results in around 6-8% slowdown in url_reverse djangobench benchmark. While not critical at all, avoiding that would be nice. URL reversing is already bottleneck when producing a large json list where each element contains a reversed url.

Similar slowdown is in url_resolve benchmark, too.

comment:16 Changed 5 years ago by Anssi Kääriäinen

I am OK for making this change, using standard library is usually a good idea. The lru_cache definitely handles correctly some cases Django doesn't. If the slowdown is too big for any specific use case, maybe it would be better to use a custom tailored cache instead there.

comment:17 Changed 5 years ago by Baptiste Mispelon <bmispelon@…>

Resolution: fixed
Status: assignedclosed

In 9b7455e918a437c3db91e88dcbf6d9c93fef96f8:

Fixed #21351 -- Replaced memoize with Python's lru_cache.

Replaced the custom, untested memoize with a similar decorator from Python's
3.2 stdlib. Although some minor performance degradation (see ticket), it is
expected that in the long run lru_cache will outperform memoize once it is
implemented in C.

Thanks to EvilDMP for the report and Baptiste Mispelon for the idea of
replacing memoize with lru_cache.

comment:18 Changed 5 years ago by Baptiste Mispelon <bmispelon@…>

In 8ed96464e99c73150ab6eff5df70b6274a871a4a:

Fixed typo in; refs #21351.

comment:19 Changed 4 years ago by Tim Graham <timograham@…>

In 61ad1ea92b1f4df992b0ef1dcc7c781da2413815:

Removed django.utils.functional.memoize per deprecation timeline.

refs #21351.

Note: See TracTickets for help on using tickets.
Back to Top