Opened 16 years ago

Closed 16 years ago

#6099 closed (fixed)

Use hashes for the filenames in "filebased" backend of django.core.cache

Reported by: Brian Johnson Owned by: nobody
Component: Core (Cache system) Version: dev
Severity: Keywords: sprintdec01
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

This refactor of the filebased cache creates the cache files as hashes broken into subdirectories like "6f/1e/d002ab5595859014ebf0951522d9" as suggested in this django-dev thread.

This also fixes a problem with add() and has_key() with expired cache items. I've included regression tests for both of those problems.

Attachments (2)

filebased_md5.patch (7.1 KB ) - added by Brian Johnson 16 years ago.
md5_no_refcount.patch (6.3 KB ) - added by Brian Johnson 16 years ago.
New patch - removes refcounting stubs

Download all attachments as: .zip

Change History (7)

by Brian Johnson, 16 years ago

Attachment: filebased_md5.patch added

comment:1 by Simon G <dev@…>, 16 years ago

Triage Stage: UnreviewedReady for checkin

comment:2 by Jacob, 16 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

I like what you've done here with the caching of the entry counts, but it's a bit over-engineered -- why not just have self.entry_count be an integer and update it on insert/delete?

And actually, now that I think about it, that's sort of a second issue. So could you redo this patch to just have the hashing (and not the entry count caching) and open a seperate ticket for that?

[Also, Simon, as a point of order: if a patch contains the word "TODO", it can't really be ready for checkin :)]

comment:3 by Brian Johnson, 16 years ago

I like what you've done here with the caching of the entry counts, but it's a bit over-engineered -- why not just have self.entry_count be an integer and update it on insert/delete?

The cache is persistent (since it's saved in files), so you need to persist the entry count as-well. Then there's also the issue of multiple processes writing to the cache (changing the count).

On the other hand the count is only used to match to max_entries, so it needs to be close but probably not super precise. Perhaps there's a way to estimate the number of entries without keeping a running count and without going through the expense of walking the whole cache. Then again, maybe I'm just over-estimating the expense of walking the tree.

And actually, now that I think about it, that's sort of a second issue. So could you redo this patch to just have the hashing (and not the entry count caching) and open a seperate ticket for that?

OK

by Brian Johnson, 16 years ago

Attachment: md5_no_refcount.patch added

New patch - removes refcounting stubs

comment:4 by Jacob, 16 years ago

Triage Stage: AcceptedReady for checkin

comment:5 by Jacob, 16 years ago

Resolution: fixed
Status: newclosed

(In [6887]) Fixed #6099: the filebased cache backend now uses md5 hashes of keys instead of sanitized filenames. For good measure, keys are partitioned into subdirectories using the first few bits of the hash. Thanks, sherbang.

Note: See TracTickets for help on using tickets.
Back to Top