Opened 8 years ago
Last modified 8 years ago
#28055 closed Cleanup/optimization
Staticfiles HashedFilesMixin postprocess optimization — at Version 1
Reported by: | Konrad Lisiczyński | Owned by: | nobody |
---|---|---|---|
Component: | contrib.staticfiles | Version: | 1.11 |
Severity: | Normal | Keywords: | staticfiles HashedFilesMixin post_process |
Cc: | David Sanders | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Django 1.11 refactored the way how collectstatic postprocessing is handled to calculate content based hashes and saving those files.
Good point of this refactoring is, than now handling of nested css is done correctly, for example here, in style.css:
@import 'substyles.css'; body { background-color: green; }
when content of substyles.css in changed, the hash of substyles.css will also change, which at the same time will change the content of style.css (updated for example to @import 'substyles.some-hash.css';) and now Django will take into consideration by upgrading hash of style.css as well. This is good, and in previous versions of Django hash of style.css wouldnt be updated.
However, the problem is, that HashedFilesMixin.postprocess is written in such a way, that adjustable_paths, so with the default implementation - css files - will be always saved 4 times! Below snippet taken from django source code explains this:
## Snippet from HashedFilesMixin.post_process # Do a single pass first. Post-process all files once, then repeat for # adjustable files. for name, hashed_name, processed, _ in self._post_process(paths, adjustable_paths, hashed_files): yield name, hashed_name, processed paths = {path: paths[path] for path in adjustable_paths} print('adjusted paths are', paths) for i in range(self.max_post_process_passes): substitutions = False for name, hashed_name, processed, subst in self._post_process(paths, adjustable_paths, hashed_files): ## Snippet from HashedFilesMixin._post_process if name in adjustable_paths: old_hashed_name = hashed_name content = original_file.read().decode(settings.FILE_CHARSET) for extension, patterns in iteritems(self._patterns): if matches_patterns(path, (extension,)): for pattern, template in patterns: converter = self.url_converter(name, hashed_files, template) try: content = pattern.sub(converter, content) except ValueError as exc: yield name, None, exc, False if hashed_file_exists: self.delete(hashed_name) # then save the processed result content_file = ContentFile(force_bytes(content)) # Save intermediate file for reference saved_name = self._save(hashed_name, content_file) hashed_name = self.hashed_name(name, content_file) if self.exists(hashed_name): self.delete(hashed_name) saved_name = self._save(hashed_name, content_file) hashed_name = force_text(self.clean_name(saved_name)) # If the file hash stayed the same, this file didn't change if old_hashed_name == hashed_name: substitutions = False processed = True
After looking at the above code carefully, you will see that all css will be saved 4 times, even when there is no point doing so in the first place, like for example when there is no @import statement in your css. I am also not sure why _save method is called twice in _postprocess.
Generally it would't be maybe extremely big problem for staticstorage which extends local file storage, but for remote storages like AWS this is unacceptable in my opinion. It would be great to change postprocessing algorythm to reduce number of save calls to minimum.