Django

Code

Ticket #6099 (closed: fixed)

Opened 7 months ago

Last modified 7 months ago

Use hashes for the filenames in "filebased" backend of django.core.cache

Reported by: sherbang Assigned to: nobody
Milestone: Component: Cache system
Version: SVN Keywords: sprintdec01
Cc: Triage Stage: Ready for checkin
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 1

Description

This refactor of the filebased cache creates the cache files as hashes broken into subdirectories like "6f/1e/d002ab5595859014ebf0951522d9" as suggested in this django-dev thread.

This also fixes a problem with add() and has_key() with expired cache items. I've included regression tests for both of those problems.

Attachments

filebased_md5.patch (7.1 kB) - added by sherbang on 12/02/07 00:29:11.
md5_no_refcount.patch (6.3 kB) - added by sherbang on 12/03/07 15:38:29.
New patch - removes refcounting stubs

Change History

12/02/07 00:29:11 changed by sherbang

  • attachment filebased_md5.patch added.

12/02/07 19:02:40 changed by Simon G <dev@simon.net.nz>

  • needs_better_patch changed.
  • stage changed from Unreviewed to Ready for checkin.
  • needs_tests changed.
  • needs_docs changed.

12/03/07 14:31:22 changed by jacob

  • needs_better_patch set to 1.
  • stage changed from Ready for checkin to Accepted.

I like what you've done here with the caching of the entry counts, but it's a bit over-engineered -- why not just have self.entry_count be an integer and update it on insert/delete?

And actually, now that I think about it, that's sort of a second issue. So could you redo this patch to just have the hashing (and not the entry count caching) and open a seperate ticket for that?

[Also, Simon, as a point of order: if a patch contains the word "TODO", it can't really be ready for checkin :)]

12/03/07 15:06:56 changed by sherbang

I like what you've done here with the caching of the entry counts, but it's a bit over-engineered -- why not just have self.entry_count be an integer and update it on insert/delete?

The cache is persistent (since it's saved in files), so you need to persist the entry count as-well. Then there's also the issue of multiple processes writing to the cache (changing the count).

On the other hand the count is only used to match to max_entries, so it needs to be close but probably not super precise. Perhaps there's a way to estimate the number of entries without keeping a running count and without going through the expense of walking the whole cache. Then again, maybe I'm just over-estimating the expense of walking the tree.

And actually, now that I think about it, that's sort of a second issue. So could you redo this patch to just have the hashing (and not the entry count caching) and open a seperate ticket for that?

OK

12/03/07 15:38:29 changed by sherbang

  • attachment md5_no_refcount.patch added.

New patch - removes refcounting stubs

12/03/07 23:53:55 changed by jacob

  • stage changed from Accepted to Ready for checkin.

12/04/07 12:03:57 changed by jacob

  • status changed from new to closed.
  • resolution set to fixed.

(In [6887]) Fixed #6099: the filebased cache backend now uses md5 hashes of keys instead of sanitized filenames. For good measure, keys are partitioned into subdirectories using the first few bits of the hash. Thanks, sherbang.


Add/Change #6099 (Use hashes for the filenames in "filebased" backend of django.core.cache)




Change Properties
Action