id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
17896	Implement file hash computation as a separate method of staticfiles' CachedFilesMixin	mkai	nobody	"I'm currently extending django-cumulus (Rackspace Cloud Files storage backend) and S3BotoStorage (Amazon S3 storage backend) to use hashed file names using the CachedFilesMixin of contrib.staticfiles. That came down to (as intended, I suppose!):

{{{
class MyCachedCloudFilesStorage(CachedFilesMixin, CloudFilesStorage):
    pass

}}}

Now I noticed that both Amazon and Rackspace include file hashes for each file in their API responses. It the response (e. g. a file list) is cached appropriately, the post-processing of the CachedFilesMixin can be sped up dramatically by just getting the hash from the cached response instead of calculating it from the file (which is expensive because it reads the file over the network).

I propose the attached patch so that backend authors can implement their own method of getting the hash for a file:

{{{
diff --git a/django/contrib/staticfiles/storage.py b/django/contrib/staticfiles/storage.py
index c000f97..e59b987 100644
--- a/django/contrib/staticfiles/storage.py
+++ b/django/contrib/staticfiles/storage.py
@@ -65,6 +65,13 @@ class CachedFilesMixin(object):
                 compiled = re.compile(pattern)
                 self._patterns.setdefault(extension, []).append(compiled)
 
+    def get_file_hash(self, name, content=None):
+        """"""Gets the MD5 hash of the file.""""""
+        md5 = hashlib.md5()
+        for chunk in content.chunks():
+            md5.update(chunk)
+        return md5.hexdigest()[:12]
+
     def hashed_name(self, name, content=None):
         parsed_name = urlsplit(unquote(name))
         clean_name = parsed_name.path.strip()
@@ -79,13 +86,9 @@ class CachedFilesMixin(object):
                 return name
         path, filename = os.path.split(clean_name)
         root, ext = os.path.splitext(filename)
-        # Get the MD5 hash of the file
-        md5 = hashlib.md5()
-        for chunk in content.chunks():
-            md5.update(chunk)
-        md5sum = md5.hexdigest()[:12]
+        file_hash = self.get_file_hash(clean_name, content)
         hashed_name = os.path.join(path, u""%s.%s%s"" %
-                                   (root, md5sum, ext))
+                                   (root, file_hash, ext))
}}}

"	Cleanup/optimization	closed	contrib.staticfiles	dev	Normal	fixed	CachedStaticFilesStorage, hash, md5	Jannis Leidel	Accepted	1	0	0	1	0	0