Opened 7 years ago

Last modified 7 years ago

#28055 closed Cleanup/optimization

Staticfiles HashedFilesMixin postprocess optimization — at Version 1

Reported by: Konrad Lisiczyński Owned by: nobody
Component: contrib.staticfiles Version: 1.11
Severity: Normal Keywords: staticfiles HashedFilesMixin post_process
Cc: David Sanders Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Konrad Lisiczyński)

Django 1.11 refactored the way how collectstatic postprocessing is handled to calculate content based hashes and saving those files.
Good point of this refactoring is, than now handling of nested css is done correctly, for example here, in style.css:

@import 'substyles.css';

body {
  background-color: green;
}

when content of substyles.css in changed, the hash of substyles.css will also change, which at the same time will change the content of style.css (updated for example to @import 'substyles.some-hash.css';) and now Django will take into consideration by upgrading hash of style.css as well. This is good, and in previous versions of Django hash of style.css wouldnt be updated.

However, the problem is, that HashedFilesMixin.postprocess is written in such a way, that adjustable_paths, so with the default implementation - css files - will be always saved 4 times! Below snippet taken from django source code explains this:

## Snippet from HashedFilesMixin.post_process

# Do a single pass first. Post-process all files once, then repeat for
# adjustable files.
for name, hashed_name, processed, _ in self._post_process(paths, adjustable_paths, hashed_files):
    yield name, hashed_name, processed

paths = {path: paths[path] for path in adjustable_paths}
print('adjusted paths are', paths)

for i in range(self.max_post_process_passes):
    substitutions = False
    for name, hashed_name, processed, subst in self._post_process(paths, adjustable_paths, hashed_files):


## Snippet from HashedFilesMixin._post_process

if name in adjustable_paths:
    old_hashed_name = hashed_name
    content = original_file.read().decode(settings.FILE_CHARSET)
    for extension, patterns in iteritems(self._patterns):
        if matches_patterns(path, (extension,)):
            for pattern, template in patterns:
                converter = self.url_converter(name, hashed_files, template)
                try:
                    content = pattern.sub(converter, content)
                except ValueError as exc:
                    yield name, None, exc, False
    if hashed_file_exists:
        self.delete(hashed_name)
    # then save the processed result
    content_file = ContentFile(force_bytes(content))
    # Save intermediate file for reference
    saved_name = self._save(hashed_name, content_file)
    hashed_name = self.hashed_name(name, content_file)

    if self.exists(hashed_name):
        self.delete(hashed_name)

    saved_name = self._save(hashed_name, content_file)
    hashed_name = force_text(self.clean_name(saved_name))
    # If the file hash stayed the same, this file didn't change
    if old_hashed_name == hashed_name:
        substitutions = False
    processed = True

After looking at the above code carefully, you will see that all css will be saved 4 times, even when there is no point doing so in the first place, like for example when there is no @import statement in your css. I am also not sure why _save method is called twice in _postprocess.

Generally it would't be maybe extremely big problem for staticstorage which extends local file storage, but for remote storages like AWS this is unacceptable in my opinion. It would be great to change postprocessing algorythm to reduce number of save calls to minimum.

Change History (1)

comment:1 by Konrad Lisiczyński, 7 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top