Opened 8 years ago
Closed 8 years ago
#28055 closed Cleanup/optimization (wontfix)
Staticfiles HashedFilesMixin postprocess optimization
Reported by: | Konrad Lisiczyński | Owned by: | Konrad Lisiczyński |
---|---|---|---|
Component: | contrib.staticfiles | Version: | 1.11 |
Severity: | Normal | Keywords: | staticfiles HashedFilesMixin post_process |
Cc: | David Sanders | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Django 1.11 refactored the way how collectstatic postprocessing is handled to calculate content based hashes and saving those files.
Good point of this refactoring is, than now handling of nested css is done correctly, for example here, in style.css:
@import 'substyles.css'; body { background-color: green; }
when content of substyles.css in changed, the hash of substyles.css will also change, which at the same time will change the content of style.css (updated for example to @import 'substyles.some-hash.css';) and now Django will take into consideration by upgrading hash of style.css as well. This is good, and in previous versions of Django hash of style.css wouldnt be updated.
However, the problem is, that HashedFilesMixin.postprocess is written in such a way, that adjustable_paths, so with the default implementation - css files - will be always saved 4 times! Below snippet taken from django source code explains this:
## Snippet from HashedFilesMixin.post_process # Do a single pass first. Post-process all files once, then repeat for # adjustable files. for name, hashed_name, processed, _ in self._post_process(paths, adjustable_paths, hashed_files): yield name, hashed_name, processed paths = {path: paths[path] for path in adjustable_paths} print('adjusted paths are', paths) for i in range(self.max_post_process_passes): substitutions = False for name, hashed_name, processed, subst in self._post_process(paths, adjustable_paths, hashed_files): ## Snippet from HashedFilesMixin._post_process if name in adjustable_paths: old_hashed_name = hashed_name content = original_file.read().decode(settings.FILE_CHARSET) for extension, patterns in iteritems(self._patterns): if matches_patterns(path, (extension,)): for pattern, template in patterns: converter = self.url_converter(name, hashed_files, template) try: content = pattern.sub(converter, content) except ValueError as exc: yield name, None, exc, False if hashed_file_exists: self.delete(hashed_name) # then save the processed result content_file = ContentFile(force_bytes(content)) # Save intermediate file for reference saved_name = self._save(hashed_name, content_file) hashed_name = self.hashed_name(name, content_file) if self.exists(hashed_name): self.delete(hashed_name) saved_name = self._save(hashed_name, content_file) hashed_name = force_text(self.clean_name(saved_name)) # If the file hash stayed the same, this file didn't change if old_hashed_name == hashed_name: substitutions = False processed = True
After looking at the above code carefully, you will see that all css will be saved 4 times, even when there is no point doing so in the first place, like for example when there is no @import statement in your css. I am also not sure why _save method is called twice in _postprocess.
Generally it would't be maybe extremely big problem for staticstorage which extends local file storage, but for remote storages like AWS this is unacceptable in my opinion. It would be great to change postprocessing algorythm to reduce number of save calls to minimum.
Change History (10)
comment:1 by , 8 years ago
Description: | modified (diff) |
---|
comment:2 by , 8 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 8 years ago
Cc: | added |
---|
David, maybe you could provide some guidance or explanations here.
comment:4 by , 8 years ago
There definitely appears to be a duplicate delete and save around line 304, I'm not sure how that slipped through code review. Must have been introduced during some refactoring of the code.
It should be possible to clean up that section and trim the extra delete/save. Might be possible to also drop a save if the hash hasn't changed as well since the hash can be calculated from the ContentFile without saving it to disk. Since the tests now expect the correct hashes (and have a few more cases) as long as tests pass it should be fairly safe. I don't have the time to look at it deeply, however.
comment:5 by , 8 years ago
Triage Stage: | Accepted → Unreviewed |
---|
Thanks for your suggestions, David. I will try to provide a patch today.
comment:6 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:7 by , 8 years ago
I tried to comment line 304: saved_name = self._save(hashed_name, content_file).
Unfortunately it is not so easy, as several tests fail now:
ERROR: test_cache_invalidation (staticfiles_tests.test_storage.TestCollectionCachedStorage) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.4/unittest/case.py", line 58, in testPartExecutor yield File "/usr/lib/python3.4/unittest/case.py", line 577, in run testMethod() File "/home/konrad/PycharmProjects/django/tests/staticfiles_tests/test_storage.py", line 252, in test_cache_invalidation self.assertEqual(self.hashed_file_path(name), hashed_name) File "/home/konrad/PycharmProjects/django/tests/staticfiles_tests/test_storage.py", line 21, in hashed_file_path fullpath = test.render_template(test.static_template_snippet(path)) File "/home/konrad/PycharmProjects/django/tests/staticfiles_tests/cases.py", line 33, in render_template return template.render(Context(**kwargs)).strip() File "/home/konrad/PycharmProjects/django/django/template/base.py", line 176, in render return self._render(context) File "/home/konrad/PycharmProjects/django/django/test/utils.py", line 101, in instrumented_test_render return self.nodelist.render(context) File "/home/konrad/PycharmProjects/django/django/template/base.py", line 944, in render bit = node.render_annotated(context) File "/home/konrad/PycharmProjects/django/django/template/base.py", line 911, in render_annotated return self.render(context) File "/home/konrad/PycharmProjects/django/django/templatetags/static.py", line 106, in render url = self.url(context) File "/home/konrad/PycharmProjects/django/django/templatetags/static.py", line 103, in url return self.handle_simple(path) File "/home/konrad/PycharmProjects/django/django/templatetags/static.py", line 118, in handle_simple return staticfiles_storage.url(path) File "/home/konrad/PycharmProjects/django/django/contrib/staticfiles/storage.py", line 155, in url return self._url(self.stored_name, name, force) File "/home/konrad/PycharmProjects/django/django/contrib/staticfiles/storage.py", line 134, in _url hashed_name = hashed_name_func(*args) File "/home/konrad/PycharmProjects/django/django/contrib/staticfiles/storage.py", line 364, in stored_name self.hashed_name(name, content=None, filename=intermediate_name) File "/home/konrad/PycharmProjects/django/django/contrib/staticfiles/storage.py", line 94, in hashed_name raise ValueError("The file '%s' could not be found with %r." % (filename, self)) ValueError: The file 'cached/styles.bb84a0240107.css' could not be found with <django.contrib.staticfiles.storage.CachedStaticFilesStorage object at 0x7f6b90e00978>. ====================================================================== ERROR: test_path_with_querystring (staticfiles_tests.test_storage.TestCollectionCachedStorage) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.4/unittest/case.py", line 58, in testPartExecutor yield File "/usr/lib/python3.4/unittest/case.py", line 577, in run testMethod() File "/home/konrad/PycharmProjects/django/tests/staticfiles_tests/test_storage.py", line 79, in test_path_with_querystring relpath = self.hashed_file_path("cached/styles.css?spam=eggs") File "/home/konrad/PycharmProjects/django/tests/staticfiles_tests/test_storage.py", line 21, in hashed_file_path fullpath = test.render_template(test.static_template_snippet(path)) File "/home/konrad/PycharmProjects/django/tests/staticfiles_tests/cases.py", line 33, in render_template return template.render(Context(**kwargs)).strip() File "/home/konrad/PycharmProjects/django/django/template/base.py", line 176, in render return self._render(context) File "/home/konrad/PycharmProjects/django/django/test/utils.py", line 101, in instrumented_test_render return self.nodelist.render(context) File "/home/konrad/PycharmProjects/django/django/template/base.py", line 944, in render bit = node.render_annotated(context) File "/home/konrad/PycharmProjects/django/django/template/base.py", line 911, in render_annotated return self.render(context) File "/home/konrad/PycharmProjects/django/django/templatetags/static.py", line 106, in render url = self.url(context) File "/home/konrad/PycharmProjects/django/django/templatetags/static.py", line 103, in url return self.handle_simple(path) File "/home/konrad/PycharmProjects/django/django/templatetags/static.py", line 118, in handle_simple return staticfiles_storage.url(path) File "/home/konrad/PycharmProjects/django/django/contrib/staticfiles/storage.py", line 155, in url return self._url(self.stored_name, name, force) File "/home/konrad/PycharmProjects/django/django/contrib/staticfiles/storage.py", line 134, in _url hashed_name = hashed_name_func(*args) File "/home/konrad/PycharmProjects/django/django/contrib/staticfiles/storage.py", line 364, in stored_name self.hashed_name(name, content=None, filename=intermediate_name) File "/home/konrad/PycharmProjects/django/django/contrib/staticfiles/storage.py", line 94, in hashed_name raise ValueError("The file '%s' could not be found with %r." % (filename, self)) ValueError: The file 'cached/styles.bb84a0240107.css' could not be found with <django.contrib.staticfiles.storage.CachedStaticFilesStorage object at 0x7f6b910014e0>. ====================================================================== FAIL: test_corrupt_intermediate_files (staticfiles_tests.test_storage.TestCollectionCachedStorage) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.4/unittest/case.py", line 58, in testPartExecutor yield File "/usr/lib/python3.4/unittest/case.py", line 577, in run testMethod() File "/home/konrad/PycharmProjects/django/tests/staticfiles_tests/test_storage.py", line 298, in test_corrupt_intermediate_files self.hashed_file_path('cached/styles.css') File "/usr/lib/python3.4/contextlib.py", line 77, in __exit__ self.gen.throw(type, value, traceback) File "/home/konrad/PycharmProjects/django/django/test/testcases.py", line 591, in _assert_raises_message_cm self.assertIn(expected_message, str(cm.exception)) File "/usr/lib/python3.4/unittest/case.py", line 1056, in assertIn self.fail(self._formatMessage(msg, standardMsg)) File "/usr/lib/python3.4/unittest/case.py", line 642, in fail raise self.failureException(msg) AssertionError: "The name 'cached/styles.css' could not be hashed with <django.contrib.staticfiles.storage.CachedStaticFilesStorage object at 0x7f6b90a03fd0>." not found in "The file 'cached /styles.bb84a0240107.css' could not be found with <django.contrib.staticfiles.storage.CachedStaticFilesStorage object at 0x7f6b90a03fd0>."
This implies, that this line is needed and it is not some kind of duplicate after all. Maybe we could contact the author of this code so he or she could provide us some insight why it was done this way and is it possible to optimize it somehow.
comment:8 by , 8 years ago
Maybe we could contact the author of this code so he or she could provide us some insight why it was done this way and is it possible to optimize it somehow.
I am the original author of some of the code (the patch that refactored the code you mentioned), it's just been a while and some details have slipped, and I don't have time to get back into it deeply.
Looking at it a tiny bit more, though, I think line 304 is necessary, and to make this more clear it should be using old_hashed_name
for the _save
call instead of hashed_name
. It is saving the file content with the old hashed name before recalculating the new hash on the (potentially) changed content.
The test failures you saw when removing it are due to the fact that CachedStaticFilesStorage
relies on these intermediate files for proper behavior. The alternative would be for CachedStaticFilesStorage
to recalculate all hashes on a single cache miss, which is very not good. With the intermediate files it can re-calculate the hash for a single file with a few file accesses.
If the concern is using a storage backend like S3, maybe a final upload step could be refactored in? The collectstatic
process is never going to be very efficient if it's writing and reading directly to S3. The post-processing requires lots of file reads and writes, so remote storage isn't great for that.
comment:9 by , 8 years ago
Thank you for elaboration, David. I think using current collectstatic as prestep for saving static files to a remote storage is really good solution. So I guess we could close this ticket?
comment:10 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
If you can propose a patch, optimizations would certainly be welcome.