Opened 8 years ago

Closed 6 years ago

Last modified 5 years ago

#24452 closed Bug (fixed)

Staticfiles backends using HashedFilesMixin don't update CSS files' hash when referenced media changes

Reported by: Paul McLanahan Owned by: Paul McLanahan
Component: contrib.staticfiles Version: 1.7
Severity: Normal Keywords:
Cc: David Sanders, Ed Morley Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Paul McLanahan)

Steps to reproduce when using the Cached or Manifest staticfiles storages:

  1. Have an image (a.jpg) and a CSS file that references this image in a url() call.
  2. Run collectstatic and observe that copies of said files now have hashed names.
  3. Change the content of a.jpg (the color was clearly wrong).
  4. Run collectstatic again
  5. Observe that a.jpg now has a new hashed name version
  6. Obesrve that the CSS file refers to this new version, but the hashed name of the CSS file has not changed.

Because the CSS file's hash name was determined before processing the file's contents, changes in the referenced assets (images, fonts, etc.) will not result in a new hashed CSS file name. Thus when using a CDN and very long cache expire headers the CSS will never update, and the new version of the image will never be seen on the site.

We're currently working around this by making insignificant changes to the CSS whenever we need to make changes to only the content of referenced media, but this is error prone and clearly suboptimal.

This was observed in 1.6 using the CachedStaticFilesStorage, but I testing in 1.7 as well. A fix could be backported to the 1.6 line should it come before 1.6 is unsupported and a patch release is desired.

Note: This was discovered in mozilla/bedrock (www.mozilla.org). More information may be added to the Mozilla bug about this as well.

Attachments (1)

staticfiles-adjustable-paths-tests.patch (12.3 KB) - added by David Sanders 6 years ago.
Tests for accurate hash filenames

Download all attachments as: .zip

Change History (28)

comment:1 Changed 8 years ago by Paul McLanahan

Description: modified (diff)

comment:2 Changed 8 years ago by Tim Graham

Resolution: duplicate
Status: newclosed

Duplicate of #22353 I think.

comment:3 Changed 8 years ago by Paul McLanahan

Hi timgram,

I did see that one, but it is dealing with a different issue. I've actually not seen that problem. For me the hashed image names are being correctly updated in the CSS. It's the name of the CSS file itself that needs to update in my case. What I'm suggesting is that a change in a media file contained in a CSS file should result in a difference in the hashed content value of the CSS file itself.

It's possible that the fix for #22353 could be included in the fix for this one or vice versa, but I do think that these problems are distinct.

comment:4 Changed 8 years ago by Collin Anderson

Resolution: duplicate
Status: closednew

Ahh, yes I think I see the distinction now if I'm reading the other ticket correctly. The other ticket says "abc.css still refers to the old hashed name of xyz.png".

In any case, they're pretty related. (I haven't confirmed either myself.)

comment:5 Changed 8 years ago by Tim Graham

Triage Stage: UnreviewedAccepted

comment:6 Changed 8 years ago by Paul McLanahan

Owner: changed from nobody to Paul McLanahan
Status: newassigned

I'm going to take this if there are no objections. I've got a patch that's working, but I need a good test. Will hopefully have a PR open soon.

comment:7 Changed 8 years ago by Paul McLanahan

It turns out I was a bit premature with my patch. I'm no longer convinced this can be reasonably fixed considering the current architecture. The problem is that the system needs to be able to calculate the hashed file name at times other than when collectstatic is running, and even during a run sometimes when the content of the modified file isn't available. For example: the CachedStaticFilesBackend will sometimes encounter a cache miss, so it must figure out what the hashed name should be. The only data it has for this is the content of the original file since it doesn't know how the file was modified and therefore can't calculate the name if it's based on the modified contents.

The other situation I found is when a CSS file uses @import to bring in another CSS file. Depending on the order in which the CSS files are processed it may not have the calculated name of the imported file in the cache yet, so it would need to calculate what the name will be, and it only has the original file contents with which to do that.

This is solvable, but I fear it may take more of an overhaul of the current architecture than is called for based on the severity of this bug. I'm going to keep working on it, but this may block fixing this one.

comment:8 Changed 8 years ago by Collin Anderson

Interesting. So it sounds like we need to be sure to calculate the hashes of the dependencies before (or also when) calculating the hash of the file with the reference. Hang in there :).

comment:9 Changed 8 years ago by Paul McLanahan

Yeah. The real issue is figuring out how to allow it to calculate the same hash for the modified contents of the file at app run time and not just at collectstatic time. I could do it if I was only dealing with the manifest storage since that should never need to recalculate the hash, just look it up. But the caching backend may get a miss and need to calculate it again from the content, and I think it'd be too slow to hash all the dependent files, replace the file names, and hash the CSS at runtime. I can't think of a persistent place to put this info.

How bad would it be to fix this only for ManifestStaticFilesStorage along with some documentation explaining the problem and why the manifest fixes it?

comment:10 Changed 7 years ago by Collin Anderson

James Aylett on stage at DUTH says that there are probably no hashers in the world that get this right. He says it's pretty rare to only change the logo (image) and not the css. He says we should document this as a limitation.

comment:11 Changed 7 years ago by Paul McLanahan

I'd agree that it's a hard problem, and likely one that few if any projects get right. I will however say that we run into this a good bit and can't imagine that we're alone. We've been solving it by including a special comment that our css minifier knows to leave in so that we can cache bust the CSS file when we update images w/o touching the CSS. But this is obviously error prone as it's manual and requires an exhaustive search for references to the modified files.

comment:12 Changed 7 years ago by Carl Meyer

Pretty sure I recall that django-compressor was/is able to get this right. Don't recall details, though, haven't used it in a while.

Changed 6 years ago by David Sanders

Tests for accurate hash filenames

comment:13 Changed 6 years ago by David Sanders

I've opened a WIP PR for this issue. The tests I've attached are from that PR, and with the exception of test_import_loop are general test coverage of the issue (fail on master).

comment:14 Changed 6 years ago by Tim Graham

Has patch: set

comment:15 Changed 6 years ago by Tim Graham

Triage Stage: AcceptedReady for checkin

comment:16 Changed 6 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 53bffe8:

Fixed #24452 -- Fixed HashedFilesMixin correctness with nested paths.

comment:17 Changed 6 years ago by Tim Graham

Cc: David Sanders added

David, is it expected for staticfiles collection to leave behind the intermediate hashed files from each round? For example, using CachedStaticFilesStorage and collecting static files for the admin creates admin/css/base.5af66c1b1797.css and base.6b517d0d5813.css in STATIC_ROOT. I noticed this while investigating #27929.

comment:18 Changed 6 years ago by David Sanders

This is expected, yes. It's required to avoid having CachedStaticFilesStorage rehash the entirety of static files to ensure that the hash is correct, during a cache miss. Without the intermediate files it'd have to rehash everything and do multiple passes to ensure it is getting the correct file (or weirdness would occur), where as with the intermediate files it can simply walk the intermediates for that specific file which should only be a couple files. There's more info on the PR, but here's a snippet explaining it a bit more:

When a cache miss occurs, the original filename is hashed as it currently is done today. However, since that may not be the final correct hash, the file pointed to by the hashed filename is requested and hashed. If the hash matches the name, then the final hash has been reached and things continue. If the hash does not match, then the process is repeated, requesting the file with the latest hash.

So it's a required less than ideal situation due to CachedStaticFilesStorage. That storage should really go the way of the dodo due to it's inefficiencies compared to ManifestStaticFilesStorage which doesn't suffer from that problem since the manifest lists the final filename and any intermediates could be dropped.

comment:19 Changed 6 years ago by Tim Graham

I'm wondering if there should be some release notes and documentation about this considering that it sort of looks like a bug to the untrained eye. Should there really be multiple versions of some admin CSS files that differ only in their hash (base.css, fonts.css, widgets.css)? Only forms.css has different hashed files with difference contents. If a project is uploading static files to some remote storage, the increased disk usage and upload time due to the increased number of files could be a nuisance.

About "the manifest lists the final filename and any intermediates could be dropped." -- is a ticket needed for this enhancement?

comment:20 in reply to:  19 ; Changed 6 years ago by David Sanders

Replying to Tim Graham:

About "the manifest lists the final filename and any intermediates could be dropped." -- is a ticket needed for this enhancement?

I think any ticket should be drop CachedStaticFilesStorage entirely. Dropping intermediates for ManifestStaticFilesStorage only would likely cause more confusion as one would not be able to switch between the two storage types without side effects.

I'm wondering if there should be some release notes and documentation about this considering that it sort of looks like a bug to the untrained eye. Should there really be multiple versions of some admin CSS files that differ only in their hash (base.css, fonts.css, widgets.css)? Only forms.css has different hashed files with difference contents. If a project is uploading static files to some remote storage, the increased disk usage and upload time due to the increased number of files could be a nuisance.

Nothing wrong with more documentation, but if CachedStaticFilesStorage is dropped it wouldn't be necessary since the intermediates could be dropped (and originals too as an optional benefit).

comment:21 Changed 6 years ago by Tim Graham

The soonest we could remove CachedStaticFilesStorage with a normal deprecation is Django 3.0. Meanwhile, we have to live with all the extra static files? That doesn't seem ideal but I don't have a good sense on how much of an issue this could be in practice. Could you write to the DevelopersMailingList to propose the deprecation and explain the current situation?

comment:22 in reply to:  21 Changed 6 years ago by David Sanders

Replying to Tim Graham:

Meanwhile, we have to live with all the extra static files?

Living with broken static files that referred to incorrect files was a bit more of an inconvenience than some extra files lying around, but it managed to hang around for quite a while. There's already 2N static files since the original non-hashed file is always kept, with the extra intermediate files it's 2N + M and in most cases M is going to be a dozen or two compared to N being likely a couple hundred or more.

Could you write to the DevelopersMailingList to propose the deprecation and explain the current situation?

I really can't, I don't have the time to shepherd the issue. I can chime in from time to time though. I will add that it'd be nice to consider a manifest option where the manifest was inserted into the codebase like locale translations rather than relying on the manifest file being stored with the static files.

comment:23 in reply to:  21 Changed 5 years ago by Ed Morley

Replying to Tim Graham:

Could you write to the DevelopersMailingList to propose the deprecation and explain the current situation?

I've posted:
https://groups.google.com/forum/#!topic/django-developers/fmfQvuHBStk

comment:24 in reply to:  20 ; Changed 5 years ago by Ed Morley

Cc: Ed Morley added

So I've filed #28606 for making CachedStaticFilesStorage be deprecated in Django 2.1, which means it could then be removed in Django 3.0.

Replying to David Sanders:

Dropping intermediates for ManifestStaticFilesStorage only would likely cause more confusion as one would not be able to switch between the two storage types without side effects.

I'm not sure I quite understand what the side-effects of switching between the two in this case would be? Even if there are side effects, given CachedStaticFilesStorage is going to be removed (and already has issues), perhaps the side effects shouldn't be a blocking to fixing the intermediate file issue for ManifestStaticFilesStorage in the meantime?

If anyone has any insight about this, please add to the newly filed #28604 - many thanks!

comment:25 in reply to:  24 Changed 5 years ago by David Sanders

Replying to Ed Morley:

Replying to David Sanders:

Dropping intermediates for ManifestStaticFilesStorage only would likely cause more confusion as one would not be able to switch between the two storage types without side effects.

I'm not sure I quite understand what the side-effects of switching between the two in this case would be?

Switching from ManifestStaticFilesStorage to CachedStaticFilesStorage will not work correctly because of the lack of intermediate files. It would work for some files that don't have intermediates, and fail for others which do, if the cache got cleared. IIRC, running the server without having run collectstatic will cause the initial lookup for any static files to be a cache miss when using CachedStaticFilesStorage, and it will attempt to hash them and populate the cache at run time. With intermediate files missing this will only partially work, as mentioned previously.

So the side-effects include partial access to static files, and a classic case of "it works on my machine" if ManifestStaticFilesStorage was used in development or testing and production uses CachedStaticFilesStorage instead, unless collectstatic is explicitly run during a deployment.

comment:26 Changed 5 years ago by Ed Morley

Thank you for the clarification.

Just to check I follow - for the breakage to be seen all of the following has to be true:

  • Using CachedStaticFilesStorage in one environment and ManifestStaticFilesStorage in another
  • Even though a different storage backend is used for both, somehow believing that it's appropriate for the files to be shared between them (eg local files copied around)
  • Collectstatic not be run on the new environment before deployment, even though it uses different settings

Whilst I believe Django should definitely try and protect against suboptimal user choices, I'm not convinced this scenario is something that should be encouraged or supported (at least so long as it's noted in the documentation and release notes).

comment:27 Changed 5 years ago by Claude Paroz

Confirmed that changing storage without re-running collectstatic is not something we want to support.

Note: See TracTickets for help on using tickets.
Back to Top