Opened 11 months ago
Closed 3 months ago
#35846 closed Cleanup/optimization (fixed)
Reproducibility of staticfiles manifests
Reported by: | Linus Heckemann | Owned by: | Matthew Stell |
---|---|---|---|
Component: | contrib.staticfiles | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Paths in staticfiles manifests appear in a nondeterministic order in the resulting JSON file. I assume this would often reflect the order in which files are listed by the operating system, given dict's insertion order preservation, but there are probably many more factors affecting this.
This can sometimes (but may not always -- this depends heavily on filesystem behaviour) be reproduced by running collectstatic in projects using ManifestStaticFilesStorage across different copies of the project source.
Sorting them would result in more comparable results, smaller diffs and (depending on the environment) more efficient deployments.
Change History (12)
comment:1 by , 11 months ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:2 by , 11 months ago
It may be hard to show this exact problem since it depends on the OS/file systems used. The problem is the list of paths in the staticfiles manifest.
Somewhat related, see https://github.com/django/django/pull/16411/files#diff-f5c7100e3528e9f6edb98cd5a3d33133bdde6286a89fa472f89980fb07364a8eR486. It sorts the paths before hashing them. If it would not sort the keys, the hash could change. I tried changing sorted() to list() on that line and ran the test suite and there were no test failures. So testing this is a bit tricky to test properly!
comment:4 by , 3 months ago
The root cause of this issue is the ordering of the files returned from the Finders.
For example, the FileSystemSotorageFinder uses os.scandir() to "find" the files.
https://github.com/django/django/blob/efb7f9ced2dcf71294353596a265e3fd67faffeb/django/core/files/storage/filesystem.py#L188
def listdir(self, path): path = self.path(path) directories, files = [], [] with os.scandir(path) as entries: for entry in entries: if entry.is_dir(): directories.append(entry.name) else: files.append(entry.name) return directories, files
The Python docs state that The entries are yielded in arbitrary order https://docs.python.org/3/library/os.html#os.scandir.
The order in which they are yielded is the order in which they are added to the found_files dictionary https://github.com/django/django/blob/cf9da6fadd44cc7654681026d202387022b30d8d/django/contrib/staticfiles/management/commands/collectstatic.py#L124 which is passed through to the save_manifest function via the post_process method.
def save_manifest(self): self.manifest_hash = self.file_hash( None, ContentFile(json.dumps(sorted(self.hashed_files.items())).encode()) ) payload = { "paths": self.hashed_files, "version": self.manifest_version, "hash": self.manifest_hash, } if self.manifest_storage.exists(self.manifest_name): self.manifest_storage.delete(self.manifest_name) contents = json.dumps(payload).encode() self.manifest_storage._save(self.manifest_name, ContentFile(contents))
Therefore the ordering of the items in staticfiles.json is by definition "arbitrary" and therefore cannot be guaranteed to be consistent.
The most thorough test I can think of is to mock os.scandir so that we can force the files to be returned in a different order. However I think this is overkill and instead reordering the storage.staticfiles_storage.hashed_files dictionary might be a more appropriate test?
For example:
def test_staticfile_content_consistency(self): manifest_file_content_orig = storage.staticfiles_storage.read_manifest() hashed_files = storage.staticfiles_storage.hashed_files # Change the order of the hashed files storage.staticfiles_storage.hashed_files = dict(reversed(hashed_files.items())) # The manifest file content should not change. storage.staticfiles_storage.save_manifest() manifest_file_content = storage.staticfiles_storage.read_manifest() self.assertEqual(manifest_file_content_orig, manifest_file_content)
What do you think?
Many thanks,
Matthew
comment:5 by , 3 months ago
Yes, that's the analysis we got to as well and how we ended up with https://github.com/django/django/pull/18676 . Your test approach sounds reasonable, though we're now working around it by subclassing ManifestStaticFilesStorage to sort the keys ourselves and I'm not particularly motivated to continue on this myself, so feel free to pick it up.
comment:6 by , 3 months ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
comment:7 by , 3 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:8 by , 3 months ago
Needs tests: | unset |
---|
comment:9 by , 3 months ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Version: | 5.0 → dev |
Thank you everyone for the detailed analysis and the initial PR. I think the issue and fix makes sense, accepting the ticket.
comment:10 by , 3 months ago
Patch needs improvement: | set |
---|
comment:11 by , 3 months ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Hi Linus, would you be able to create a test project with some instructions as to how to reproduce this behavior?
If this is an issue we will need to be able to confirm it's fixed