Opened 11 months ago

Closed 3 months ago

#35846 closed Cleanup/optimization (fixed)

Reproducibility of staticfiles manifests

Reported by: Linus Heckemann Owned by: Matthew Stell
Component: contrib.staticfiles Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Paths in staticfiles manifests appear in a nondeterministic order in the resulting JSON file. I assume this would often reflect the order in which files are listed by the operating system, given dict's insertion order preservation, but there are probably many more factors affecting this.

This can sometimes (but may not always -- this depends heavily on filesystem behaviour) be reproduced by running collectstatic in projects using ManifestStaticFilesStorage across different copies of the project source.

Sorting them would result in more comparable results, smaller diffs and (depending on the environment) more efficient deployments.

Change History (12)

comment:1 by Sarah Boyce, 11 months ago

Resolution: needsinfo
Status: newclosed

Hi Linus, would you be able to create a test project with some instructions as to how to reproduce this behavior?
If this is an issue we will need to be able to confirm it's fixed

comment:2 by Andreas Pelme, 11 months ago

It may be hard to show this exact problem since it depends on the OS/file systems used. The problem is the list of paths in the staticfiles manifest.

Somewhat related, see https://github.com/django/django/pull/16411/files#diff-f5c7100e3528e9f6edb98cd5a3d33133bdde6286a89fa472f89980fb07364a8eR486. It sorts the paths before hashing them. If it would not sort the keys, the hash could change. I tried changing sorted() to list() on that line and ran the test suite and there were no test failures. So testing this is a bit tricky to test properly!

comment:3 by Matthias Kestenholz, 8 months ago

Needs tests: set

comment:4 by Matthew Stell, 3 months ago

The root cause of this issue is the ordering of the files returned from the Finders.

For example, the FileSystemSotorageFinder uses os.scandir() to "find" the files.
https://github.com/django/django/blob/efb7f9ced2dcf71294353596a265e3fd67faffeb/django/core/files/storage/filesystem.py#L188

    def listdir(self, path):
        path = self.path(path)
        directories, files = [], []
        with os.scandir(path) as entries:
            for entry in entries:
                if entry.is_dir():
                    directories.append(entry.name)
                else:
                    files.append(entry.name)
        return directories, files

The Python docs state that The entries are yielded in arbitrary order https://docs.python.org/3/library/os.html#os.scandir.

The order in which they are yielded is the order in which they are added to the found_files dictionary https://github.com/django/django/blob/cf9da6fadd44cc7654681026d202387022b30d8d/django/contrib/staticfiles/management/commands/collectstatic.py#L124 which is passed through to the save_manifest function via the post_process method.

https://github.com/django/django/blob/1ba5fe19ca221663e6a1e9391dbe726bb2baaf8a/django/contrib/staticfiles/storage.py#L498

    def save_manifest(self):
        self.manifest_hash = self.file_hash(
            None, ContentFile(json.dumps(sorted(self.hashed_files.items())).encode())
        )
        payload = {
            "paths": self.hashed_files,
            "version": self.manifest_version,
            "hash": self.manifest_hash,
        }
        if self.manifest_storage.exists(self.manifest_name):
            self.manifest_storage.delete(self.manifest_name)
        contents = json.dumps(payload).encode()
        self.manifest_storage._save(self.manifest_name, ContentFile(contents))

Therefore the ordering of the items in staticfiles.json is by definition "arbitrary" and therefore cannot be guaranteed to be consistent.

The most thorough test I can think of is to mock os.scandir so that we can force the files to be returned in a different order. However I think this is overkill and instead reordering the storage.staticfiles_storage.hashed_files dictionary might be a more appropriate test?

For example:

    def test_staticfile_content_consistency(self):
        manifest_file_content_orig = storage.staticfiles_storage.read_manifest()
        hashed_files = storage.staticfiles_storage.hashed_files
        # Change the order of the hashed files
        storage.staticfiles_storage.hashed_files = dict(reversed(hashed_files.items()))
        # The manifest file content should not change.
        storage.staticfiles_storage.save_manifest()
        manifest_file_content = storage.staticfiles_storage.read_manifest()
        self.assertEqual(manifest_file_content_orig, manifest_file_content)

What do you think?

Many thanks,
Matthew

Version 0, edited 3 months ago by Matthew Stell (next)

comment:5 by Linus Heckemann, 3 months ago

Yes, that's the analysis we got to as well and how we ended up with https://github.com/django/django/pull/18676 . Your test approach sounds reasonable, though we're now working around it by subclassing ManifestStaticFilesStorage to sort the keys ourselves and I'm not particularly motivated to continue on this myself, so feel free to pick it up.

comment:6 by Matthew Stell, 3 months ago

Resolution: needsinfo
Status: closednew

comment:7 by Matthew Stell, 3 months ago

Owner: set to Matthew Stell
Status: newassigned

comment:8 by Matthew Stell, 3 months ago

Needs tests: unset

comment:9 by Natalia Bidart, 3 months ago

Triage Stage: UnreviewedAccepted
Version: 5.0dev

Thank you everyone for the detailed analysis and the initial PR. I think the issue and fix makes sense, accepting the ticket.

comment:10 by Natalia Bidart, 3 months ago

Patch needs improvement: set

comment:11 by Natalia Bidart, 3 months ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:12 by nessita <124304+nessita@…>, 3 months ago

Resolution: fixed
Status: assignedclosed

In 7feafd79:

Fixed #35846 -- Ensured consistent path ordering in ManifestStaticFilesStorage manifest files.

This change reuses the existing sorting of hashed_files in
ManifestStaticFilesStorage.save_manifest to also store a sorted
paths mapping in the manifest file. This ensures stable manifest
output that does not change unnecessarily.

Note: See TracTickets for help on using tickets.
Back to Top