#24452 closed Bug (fixed)
Staticfiles backends using HashedFilesMixin don't update CSS files' hash when referenced media changes
Reported by: | Paul McLanahan | Owned by: | Paul McLanahan |
---|---|---|---|
Component: | contrib.staticfiles | Version: | 1.7 |
Severity: | Normal | Keywords: | |
Cc: | David Sanders, Ed Morley | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Steps to reproduce when using the Cached or Manifest staticfiles storages:
- Have an image (
a.jpg
) and a CSS file that references this image in aurl()
call. - Run
collectstatic
and observe that copies of said files now have hashed names. - Change the content of
a.jpg
(the color was clearly wrong). - Run
collectstatic
again - Observe that
a.jpg
now has a new hashed name version - Obesrve that the CSS file refers to this new version, but the hashed name of the CSS file has not changed.
Because the CSS file's hash name was determined before processing the file's contents, changes in the referenced assets (images, fonts, etc.) will not result in a new hashed CSS file name. Thus when using a CDN and very long cache expire headers the CSS will never update, and the new version of the image will never be seen on the site.
We're currently working around this by making insignificant changes to the CSS whenever we need to make changes to only the content of referenced media, but this is error prone and clearly suboptimal.
This was observed in 1.6 using the CachedStaticFilesStorage
, but I testing in 1.7 as well. A fix could be backported to the 1.6 line should it come before 1.6 is unsupported and a patch release is desired.
Note: This was discovered in mozilla/bedrock (www.mozilla.org). More information may be added to the Mozilla bug about this as well.
Attachments (1)
Change History (28)
comment:1 by , 10 years ago
Description: | modified (diff) |
---|
comment:2 by , 10 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
comment:3 by , 10 years ago
Hi timgram,
I did see that one, but it is dealing with a different issue. I've actually not seen that problem. For me the hashed image names are being correctly updated in the CSS. It's the name of the CSS file itself that needs to update in my case. What I'm suggesting is that a change in a media file contained in a CSS file should result in a difference in the hashed content value of the CSS file itself.
It's possible that the fix for #22353 could be included in the fix for this one or vice versa, but I do think that these problems are distinct.
comment:4 by , 10 years ago
Resolution: | duplicate |
---|---|
Status: | closed → new |
Ahh, yes I think I see the distinction now if I'm reading the other ticket correctly. The other ticket says "abc.css still refers to the old hashed name of xyz.png".
In any case, they're pretty related. (I haven't confirmed either myself.)
comment:5 by , 10 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:6 by , 10 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I'm going to take this if there are no objections. I've got a patch that's working, but I need a good test. Will hopefully have a PR open soon.
comment:7 by , 10 years ago
It turns out I was a bit premature with my patch. I'm no longer convinced this can be reasonably fixed considering the current architecture. The problem is that the system needs to be able to calculate the hashed file name at times other than when collectstatic
is running, and even during a run sometimes when the content of the modified file isn't available. For example: the CachedStaticFilesBackend
will sometimes encounter a cache miss, so it must figure out what the hashed name should be. The only data it has for this is the content of the original file since it doesn't know how the file was modified and therefore can't calculate the name if it's based on the modified contents.
The other situation I found is when a CSS file uses @import
to bring in another CSS file. Depending on the order in which the CSS files are processed it may not have the calculated name of the imported file in the cache yet, so it would need to calculate what the name will be, and it only has the original file contents with which to do that.
This is solvable, but I fear it may take more of an overhaul of the current architecture than is called for based on the severity of this bug. I'm going to keep working on it, but this may block fixing this one.
comment:8 by , 10 years ago
Interesting. So it sounds like we need to be sure to calculate the hashes of the dependencies before (or also when) calculating the hash of the file with the reference. Hang in there :).
comment:9 by , 10 years ago
Yeah. The real issue is figuring out how to allow it to calculate the same hash for the modified contents of the file at app run time and not just at collectstatic time. I could do it if I was only dealing with the manifest storage since that should never need to recalculate the hash, just look it up. But the caching backend may get a miss and need to calculate it again from the content, and I think it'd be too slow to hash all the dependent files, replace the file names, and hash the CSS at runtime. I can't think of a persistent place to put this info.
How bad would it be to fix this only for ManifestStaticFilesStorage
along with some documentation explaining the problem and why the manifest fixes it?
comment:10 by , 9 years ago
James Aylett on stage at DUTH says that there are probably no hashers in the world that get this right. He says it's pretty rare to only change the logo (image) and not the css. He says we should document this as a limitation.
comment:11 by , 9 years ago
I'd agree that it's a hard problem, and likely one that few if any projects get right. I will however say that we run into this a good bit and can't imagine that we're alone. We've been solving it by including a special comment that our css minifier knows to leave in so that we can cache bust the CSS file when we update images w/o touching the CSS. But this is obviously error prone as it's manual and requires an exhaustive search for references to the modified files.
comment:12 by , 9 years ago
Pretty sure I recall that django-compressor was/is able to get this right. Don't recall details, though, haven't used it in a while.
by , 9 years ago
Attachment: | staticfiles-adjustable-paths-tests.patch added |
---|
Tests for accurate hash filenames
comment:13 by , 9 years ago
I've opened a WIP PR for this issue. The tests I've attached are from that PR, and with the exception of test_import_loop
are general test coverage of the issue (fail on master).
comment:14 by , 9 years ago
Has patch: | set |
---|
comment:15 by , 8 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:17 by , 8 years ago
Cc: | added |
---|
David, is it expected for staticfiles collection to leave behind the intermediate hashed files from each round? For example, using CachedStaticFilesStorage
and collecting static files for the admin creates admin/css/base.5af66c1b1797.css
and base.6b517d0d5813.css
in STATIC_ROOT
. I noticed this while investigating #27929.
comment:18 by , 8 years ago
This is expected, yes. It's required to avoid having CachedStaticFilesStorage
rehash the entirety of static files to ensure that the hash is correct, during a cache miss. Without the intermediate files it'd have to rehash everything and do multiple passes to ensure it is getting the correct file (or weirdness would occur), where as with the intermediate files it can simply walk the intermediates for that specific file which should only be a couple files. There's more info on the PR, but here's a snippet explaining it a bit more:
When a cache miss occurs, the original filename is hashed as it currently is done today. However, since that may not be the final correct hash, the file pointed to by the hashed filename is requested and hashed. If the hash matches the name, then the final hash has been reached and things continue. If the hash does not match, then the process is repeated, requesting the file with the latest hash.
So it's a required less than ideal situation due to CachedStaticFilesStorage
. That storage should really go the way of the dodo due to it's inefficiencies compared to ManifestStaticFilesStorage
which doesn't suffer from that problem since the manifest lists the final filename and any intermediates could be dropped.
follow-up: 20 comment:19 by , 8 years ago
I'm wondering if there should be some release notes and documentation about this considering that it sort of looks like a bug to the untrained eye. Should there really be multiple versions of some admin CSS files that differ only in their hash (base.css, fonts.css, widgets.css)? Only forms.css has different hashed files with difference contents. If a project is uploading static files to some remote storage, the increased disk usage and upload time due to the increased number of files could be a nuisance.
About "the manifest lists the final filename and any intermediates could be dropped." -- is a ticket needed for this enhancement?
follow-up: 24 comment:20 by , 8 years ago
Replying to Tim Graham:
About "the manifest lists the final filename and any intermediates could be dropped." -- is a ticket needed for this enhancement?
I think any ticket should be drop CachedStaticFilesStorage
entirely. Dropping intermediates for ManifestStaticFilesStorage
only would likely cause more confusion as one would not be able to switch between the two storage types without side effects.
I'm wondering if there should be some release notes and documentation about this considering that it sort of looks like a bug to the untrained eye. Should there really be multiple versions of some admin CSS files that differ only in their hash (base.css, fonts.css, widgets.css)? Only forms.css has different hashed files with difference contents. If a project is uploading static files to some remote storage, the increased disk usage and upload time due to the increased number of files could be a nuisance.
Nothing wrong with more documentation, but if CachedStaticFilesStorage
is dropped it wouldn't be necessary since the intermediates could be dropped (and originals too as an optional benefit).
follow-ups: 22 23 comment:21 by , 8 years ago
The soonest we could remove CachedStaticFilesStorage
with a normal deprecation is Django 3.0. Meanwhile, we have to live with all the extra static files? That doesn't seem ideal but I don't have a good sense on how much of an issue this could be in practice. Could you write to the DevelopersMailingList to propose the deprecation and explain the current situation?
comment:22 by , 8 years ago
Replying to Tim Graham:
Meanwhile, we have to live with all the extra static files?
Living with broken static files that referred to incorrect files was a bit more of an inconvenience than some extra files lying around, but it managed to hang around for quite a while. There's already 2N static files since the original non-hashed file is always kept, with the extra intermediate files it's 2N + M and in most cases M is going to be a dozen or two compared to N being likely a couple hundred or more.
Could you write to the DevelopersMailingList to propose the deprecation and explain the current situation?
I really can't, I don't have the time to shepherd the issue. I can chime in from time to time though. I will add that it'd be nice to consider a manifest option where the manifest was inserted into the codebase like locale
translations rather than relying on the manifest file being stored with the static files.
comment:23 by , 7 years ago
Replying to Tim Graham:
Could you write to the DevelopersMailingList to propose the deprecation and explain the current situation?
I've posted:
https://groups.google.com/forum/#!topic/django-developers/fmfQvuHBStk
follow-up: 25 comment:24 by , 7 years ago
Cc: | added |
---|
So I've filed #28606 for making CachedStaticFilesStorage
be deprecated in Django 2.1, which means it could then be removed in Django 3.0.
Replying to David Sanders:
Dropping intermediates for
ManifestStaticFilesStorage
only would likely cause more confusion as one would not be able to switch between the two storage types without side effects.
I'm not sure I quite understand what the side-effects of switching between the two in this case would be? Even if there are side effects, given CachedStaticFilesStorage
is going to be removed (and already has issues), perhaps the side effects shouldn't be a blocking to fixing the intermediate file issue for ManifestStaticFilesStorage
in the meantime?
If anyone has any insight about this, please add to the newly filed #28604 - many thanks!
comment:25 by , 7 years ago
Replying to Ed Morley:
Replying to David Sanders:
Dropping intermediates for
ManifestStaticFilesStorage
only would likely cause more confusion as one would not be able to switch between the two storage types without side effects.
I'm not sure I quite understand what the side-effects of switching between the two in this case would be?
Switching from ManifestStaticFilesStorage
to CachedStaticFilesStorage
will not work correctly because of the lack of intermediate files. It would work for some files that don't have intermediates, and fail for others which do, if the cache got cleared. IIRC, running the server without having run collectstatic
will cause the initial lookup for any static files to be a cache miss when using CachedStaticFilesStorage
, and it will attempt to hash them and populate the cache at run time. With intermediate files missing this will only partially work, as mentioned previously.
So the side-effects include partial access to static files, and a classic case of "it works on my machine" if ManifestStaticFilesStorage
was used in development or testing and production uses CachedStaticFilesStorage
instead, unless collectstatic
is explicitly run during a deployment.
comment:26 by , 7 years ago
Thank you for the clarification.
Just to check I follow - for the breakage to be seen all of the following has to be true:
- Using
CachedStaticFilesStorage
in one environment andManifestStaticFilesStorage
in another - Even though a different storage backend is used for both, somehow believing that it's appropriate for the files to be shared between them (eg local files copied around)
- Collectstatic not be run on the new environment before deployment, even though it uses different settings
Whilst I believe Django should definitely try and protect against suboptimal user choices, I'm not convinced this scenario is something that should be encouraged or supported (at least so long as it's noted in the documentation and release notes).
comment:27 by , 7 years ago
Confirmed that changing storage without re-running collectstatic
is not something we want to support.
Duplicate of #22353 I think.