Code

Opened 7 years ago

Closed 7 years ago

#6099 closed (fixed)

Use hashes for the filenames in "filebased" backend of django.core.cache

Reported by: sherbang Owned by: nobody
Component: Core (Cache system) Version: master
Severity: Keywords: sprintdec01
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

This refactor of the filebased cache creates the cache files as hashes broken into subdirectories like "6f/1e/d002ab5595859014ebf0951522d9" as suggested in this django-dev thread.

This also fixes a problem with add() and has_key() with expired cache items. I've included regression tests for both of those problems.

Attachments (2)

filebased_md5.patch (7.1 KB) - added by sherbang 7 years ago.
md5_no_refcount.patch (6.3 KB) - added by sherbang 7 years ago.
New patch - removes refcounting stubs

Download all attachments as: .zip

Change History (7)

Changed 7 years ago by sherbang

comment:1 Changed 7 years ago by Simon G <dev@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Ready for checkin

comment:2 Changed 7 years ago by jacob

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

I like what you've done here with the caching of the entry counts, but it's a bit over-engineered -- why not just have self.entry_count be an integer and update it on insert/delete?

And actually, now that I think about it, that's sort of a second issue. So could you redo this patch to just have the hashing (and not the entry count caching) and open a seperate ticket for that?

[Also, Simon, as a point of order: if a patch contains the word "TODO", it can't really be ready for checkin :)]

comment:3 Changed 7 years ago by sherbang

I like what you've done here with the caching of the entry counts, but it's a bit over-engineered -- why not just have self.entry_count be an integer and update it on insert/delete?

The cache is persistent (since it's saved in files), so you need to persist the entry count as-well. Then there's also the issue of multiple processes writing to the cache (changing the count).

On the other hand the count is only used to match to max_entries, so it needs to be close but probably not super precise. Perhaps there's a way to estimate the number of entries without keeping a running count and without going through the expense of walking the whole cache. Then again, maybe I'm just over-estimating the expense of walking the tree.

And actually, now that I think about it, that's sort of a second issue. So could you redo this patch to just have the hashing (and not the entry count caching) and open a seperate ticket for that?

OK

Changed 7 years ago by sherbang

New patch - removes refcounting stubs

comment:4 Changed 7 years ago by jacob

  • Triage Stage changed from Accepted to Ready for checkin

comment:5 Changed 7 years ago by jacob

  • Resolution set to fixed
  • Status changed from new to closed

(In [6887]) Fixed #6099: the filebased cache backend now uses md5 hashes of keys instead of sanitized filenames. For good measure, keys are partitioned into subdirectories using the first few bits of the hash. Thanks, sherbang.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.