Code

Opened 3 years ago

Closed 3 years ago

#16161 closed Bug (fixed)

If extra static files are added into STATIC_ROOT, they are transformed to symlinks in the next run of ``collectstatic -l``

Reported by: gnotaras Owned by: aaugustin
Component: contrib.staticfiles Version: 1.3
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by aaugustin)

Here are the related settings:

MEDIA_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), 'media', '')).replace('\\','/')
MEDIA_URL = '/media/'
STATIC_ROOT = os.path.join(MEDIA_ROOT, 'site')
STATIC_URL = MEDIA_URL + 'site/'
ADMIN_MEDIA_PREFIX = STATIC_URL + 'admin/'
STATICFILES_DIRS = ()
STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder',
#    'django.contrib.staticfiles.finders.DefaultStorageFinder',
    'compressor.finders.CompressorFinder',
)

How to re-produce:

  1. Run : python manage.py collectstatic -l --noinput
  2. Put some extra static files in STATIC_ROOT
  3. Re-run python manage.py collectstatic -l --noinput
  4. The extra files have been replaced by symlinks pointing to themselves.

Attachments (0)

Change History (22)

comment:1 Changed 3 years ago by gnotaras

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Summary changed from If extra static files are added into STATIC_ROOT are transformed to symlinks in the next run of ``collectstatic -l`` to If extra static files are added into STATIC_ROOT, they are transformed to symlinks in the next run of ``collectstatic -l``

comment:2 follow-ups: Changed 3 years ago by aaugustin

  • Component changed from contrib.staticfiles to Documentation
  • Triage Stage changed from Unreviewed to Accepted
  • Type changed from Bug to Cleanup/optimization

You shouldn't put some extra static files in STATIC_ROOT.

I expected to find a big warning in the docs: "Don't put your static files in STATIC_ROOT", but I didn't. We should add it.

comment:3 Changed 3 years ago by aaugustin

  • Summary changed from If extra static files are added into STATIC_ROOT, they are transformed to symlinks in the next run of ``collectstatic -l`` to Warn that extra static files shouldn't be added directly into STATIC_ROOT

comment:4 in reply to: ↑ 2 ; follow-up: Changed 3 years ago by gnotaras

Replying to aaugustin:

You shouldn't put some extra static files in STATIC_ROOT.

So, what you say is that the transformation of a file to a symlink to itself is expected behavior from a software. What if I didn't have a backup of that file? I think this issue requires more clarification before changing the subject and the ticket type.

Why would putting extra files in STATIC_ROOT manually be a problem? These extra files are still files required for proper display of the web site, like a custom logo or a custom image map and the place they belong to is STATIC_ROOT.

Suppose that STATIC_URL points to a separate domain dedicated to serving web site media. Why shouldn't I be able to put the logo or a custom CSS file in there? Where would be the right place to put the custom logo?

The change of the summary line is unacceptable as it practically buries the reported issue. Please change it back to the original.

comment:5 in reply to: ↑ 4 Changed 3 years ago by gnotaras

STATICFILES_DIRS can still be utilized so as to avoid adding the extra files manually, but the original bug still remains.

comment:6 in reply to: ↑ 2 ; follow-up: Changed 3 years ago by ramiro

  • Component changed from Documentation to contrib.staticfiles
  • Summary changed from Warn that extra static files shouldn't be added directly into STATIC_ROOT to If extra static files are added into STATIC_ROOT, they are transformed to symlinks in the next run of ``collectstatic -l``
  • Type changed from Cleanup/optimization to Bug

Replying to aaugustin:

You shouldn't put some extra static files in STATIC_ROOT.

I expected to find a big warning in the docs: "Don't put your static files in STATIC_ROOT", but I didn't. We should add it.

It is just there, in the STATIC_ROOT setting documentation:

Warning

This should be an (initially empty) destination directory for collecting your static files from their permanent locations into one directory for ease of deployment; it is not a place to store your static files permanently. You should do that in directories that will be found by staticfiles's finders, which by default, are 'static/' app sub-directories and any directories you include in STATICFILES_DIRS).

Replying to gnotaras:

Why would putting extra files in STATIC_ROOT manually be a problem? These extra files are still files required for proper display of the web site, like a custom logo or a custom image map and the place they belong to is STATIC_ROOT.

No, as per above they belong to a directory listed in STATICFILES_DIRS

But I agree replacing fiels with links to self isn't right.

comment:7 in reply to: ↑ 6 Changed 3 years ago by gnotaras

Why would putting extra files in STATIC_ROOT manually be a problem? These extra files are still files required for proper display of the web site, like a custom logo or a custom image map and the place they belong to is STATIC_ROOT.

No, as per above they belong to a directory listed in STATICFILES_DIRS

Hello. There are times that static files can be automatically generated. An example case is the dynamic concatenation of CSS and JS files. The resulting compressed files normally have to be stored within STATIC_ROOT. Storing them elsewhere does not make sense. I think that the requirement of an initially empty STATIC_ROOT should not exist. Django should update any files within STATIC_ROOT only if a file with the same name is found by the static finders.

comment:8 follow-up: Changed 3 years ago by aaugustin

STATIC_ROOT is a build target. It's managed by the build system. We're not supposed to alter it.

collectstatic says that it will overwrite everything inside:

You have requested to collect static files at the destination
location as specified in your settings file.

This will overwrite existing files.
Are you sure you want to do this?

Type 'yes' to continue, or 'no' to cancel: 

If it must be altered, that is a step of the deployment processs that must happen after collectstatic. For your example, JS / CSS minification, see django-compressor: compress runs after collectstatic.


Still, it's true that the current behavior of collectstatic -l when someone adds files to STATIC_ROOT does not make sense.

I think the most predictable behavior is to delete any file in STATIC_ROOT that is not found in one of STATICFILES_DIRS.

If we don't do this, developers may believe that they can add stuff to STATIC_ROOT ("Hey dude! It just works!"), and that creates a data-loss problem. Suppose I add manually a "logo.png" file in STATIC_ROOT on the production server, and later on, someone else creates "logo.png" in one of STATICFILES_DIRS. During the next collectstatic, the version from STATIC_ROOT will be overwritten silently and lost!

Short version : if it breaks (ie. a file is overwritten or deleted), it'd better break now and loudly than at some point in the future and silently.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 3 years ago by gnotaras

Replying to aaugustin:

collectstatic says that it will overwrite everything inside:

Thanks for your detailed reply. Overwriting or forcing the creation of new symlinks for the those files that are found by the finders in the app directories or in the STATICFILES_DIRS, is the expected behavior (at least for me).

The cleanup of STATIC_ROOT, removing any files or symlinks that are not found by the finders, could be useful, but it does not take place right now. There could be a command line switch for collectstatic that adds cleanup functionality.

Still, it's true that the current behavior of collectstatic -l when someone adds files to STATIC_ROOT does not make sense.

I think the most predictable behavior is to delete any file in STATIC_ROOT that is not found in one of STATICFILES_DIRS.

If we don't do this, developers may believe that they can add stuff to STATIC_ROOT ("Hey dude! It just works!"), and that creates a data-loss problem. Suppose I add manually a "logo.png" file in STATIC_ROOT on the production server, and later on, someone else creates "logo.png" in one of STATICFILES_DIRS. During the next collectstatic, the version from STATIC_ROOT will be overwritten silently and lost!

Short version : if it breaks (ie. a file is overwritten or deleted), it'd better break now and loudly than at some point in the future and silently.

You are right about this, but I would prefer deletions inside STATIC_ROOT to take place only if the user used a command line switch and not by default.

---

All in all, I'd like to clarify that I did not mean to complain about collectstatic. The way it currently works is fantastic. The only issue I encountered occurs only if the -l switch is used, in which case any files inside STATIC_ROOT that are not found by the static-finders are replaced by symlinks-to-self. If -l is not used, then everything works fine. I am sorry if I did not clarify it from the beginning.

comment:10 in reply to: ↑ 9 Changed 3 years ago by aaugustin

Replying to gnotaras:

All in all, I'd like to clarify that I did not mean to complain about collectstatic.

Sorry if my comments sounded too harsh. We're all trying to make Django as useful as possible.


The way it currently works is fantastic. The only issue I encountered occurs only if the -l switch is used, in which case any files inside STATIC_ROOT that are not found by the static-finders are replaced by symlinks-to-self. If -l is not used, then everything works fine. I am sorry if I did not clarify it from the beginning.

Yes, I understand this.

I think we have to choose between the following three options:

  • 1) just fix the problem with the "-l" switch; it's an improvement, but we still have an underdefined behavior when STATIC_ROOT contains extra files (either added manually to STATIC_ROOT, or removed from STATICFILES_DIRS), and we're exposed to the data-loss problem I explained;
  • 2) normalize the behavior of collectstatic by removing any extra files; but that's backwards incompatible for people that rely on the current, undocumented behavior;
  • 3) same as 2), but with a command-line option to control whether collectstatic will clean up unused files; the default value of this option is debatable (backward compatibility vs. predictability).

comment:11 Changed 3 years ago by lukeplant

I think we should get collectstatic to properly clean out all existing files. There is an additional reason for this:

Existing files, possibly from a previous version of the project, can cause bugs. For example, suppose there is a javascript file that is loaded by some pages and alters some behaviour. Suppose the javascript file is deliberately removed, to remove that behaviour, but not all of the templates it is used in are corrected.

In development, the developer will test the pages, and they will have changed their behaviour as required, because even when the file is requested by the browser, the static files view will not find the js file, and not return it. However, in production, since STATIC_ROOT is not cleaned, the old file is still there, and the old behaviour will occur.

So, we should fix it aaugustin suggested.

However, since we've never explicitly said that the process will delete existing files, we should: 1) allow a command line switch to *disable* it (it should be on by default) 2) put a note in the release notes about the new default behaviour.

comment:12 Changed 3 years ago by aaugustin

  • Description modified (diff)
  • Owner changed from nobody to aaugustin

Thanks Luke for making the decision. I'll work on a patch.

comment:13 follow-up: Changed 3 years ago by jezdez

Honestly, I don't think proactively deleting files should be either an option or even enabled by default. If you put files in STATIC_ROOT you're on your own, if you want to use staticfiles correctly, simply don't put files there. I don't think we should handhold users here.

Fixing the actually reported symlinking issue seems like a good idea though.

comment:14 in reply to: ↑ 13 ; follow-up: Changed 3 years ago by kmtracey

Replying to jezdez:

Honestly, I don't think proactively deleting files should be either an option or even enabled by default. If you put files in STATIC_ROOT you're on your own, if you want to use staticfiles correctly, simply don't put files there. I don't think we should handhold users here.

But the problem Luke is noting doesn't involve the user adding any files to STATIC_ROOT. It's a problem that will arise when a source static file is removed from the origin trees: it won't, on re-running collect static, get removed from the production static tree, even though it no longer exists.

comment:15 in reply to: ↑ 14 Changed 3 years ago by jezdez

Replying to kmtracey:

Replying to jezdez:

Honestly, I don't think proactively deleting files should be either an option or even enabled by default. If you put files in STATIC_ROOT you're on your own, if you want to use staticfiles correctly, simply don't put files there. I don't think we should handhold users here.

But the problem Luke is noting doesn't involve the user adding any files to STATIC_ROOT. It's a problem that will arise when a source static file is removed from the origin trees: it won't, on re-running collect static, get removed from the production static tree, even though it no longer exists.

Isn't that the same problem hiding behind a different workflow? If you know that files changed in one of the static locations, simply blow away STATIC_ROOT and you're good. In other words, I'm hesitant adding "intelligence" to collectstatic. That said, I understand this needs a elaborate documentation to make sure this is known in the first place.

comment:16 follow-up: Changed 3 years ago by kmtracey

I don't see deleting (or renaming) a file in source static trees to be at all the same thing as having added a file to the the target tree. The first is something that is a natural part of development, the 2nd is a should-not-do (unless it's done by another automated step that is always run after collectstatic). Requiring devs to remember that "oh, I deleted/renamed a static file, I need to do something special for deploy to work properly next time" is a recipe for disaster. I would have expected collectstatic to start out by deleting the old target tree, so as to ensure that what it produces correctly mirrors what is now in the source trees, with no stale leftover files. What's the rationale for NOT doing this? It seems to me it's what is needed in order to correctly support deleting/renaming source static files during development. Yes, it will blow away any files that have been somehow added into the static target tree after collectstatic finished the last time, but I see that as a should-not-do anyway, so I don't really understand why breaking things if that has been done is a bad thing.

comment:17 Changed 3 years ago by gnotaras

The bottom line is whether you want to give us the option to delete the previous static files deployment in STATIC_ROOT through the collectstatic command, and, if yes, whether you want collectstatic to perform this fatal operation by default or not.

After thinking about it for a while, either solution makes sense. However, as a user, I'd like to have the choice to regenerate STATIC_ROOT from scratch, but I wouldn't want the collectstatic command to perform any deletions by default.

comment:18 in reply to: ↑ 16 Changed 3 years ago by jezdez

  • UI/UX unset

Replying to kmtracey:

I don't see deleting (or renaming) a file in source static trees to be at all the same thing as having added a file to the the target tree. The first is something that is a natural part of development, the 2nd is a should-not-do (unless it's done by another automated step that is always run after collectstatic). Requiring devs to remember that "oh, I deleted/renamed a static file, I need to do something special for deploy to work properly next time" is a recipe for disaster. I would have expected collectstatic to start out by deleting the old target tree, so as to ensure that what it produces correctly mirrors what is now in the source trees, with no stale leftover files. What's the rationale for NOT doing this?

The rationale is simple: not having to copy/link all static files again on each run. It can take quite a while to run collectstatic on a medium to big Django site.

It seems to me it's what is needed in order to correctly support deleting/renaming source static files during development. Yes, it will blow away any files that have been somehow added into the static target tree after collectstatic finished the last time, but I see that as a should-not-do anyway, so I don't really understand why breaking things if that has been done is a bad thing.

collectstatic tries to be smart about adding files to STATIC_ROOT by comparing the modification_time of the files it tries to copy/link, to be quicker, but I don't see a good way to check that a file is to be removed. E.g. how could we differ a file that is located in STATIC_ROOT but can't be found in one of the locations staticfiles collects files from?

comment:19 Changed 3 years ago by jezdez

Having thought more about the issue Luke was mentioning, I don't believe I agree that his is an actual problem. In fact I might want old files (such as pre-compressed stylesheet files) to stay in STATIC_ROOT to make sure users with cached pages or templates referring to those files don't break. In any case, I believe that the simplest way of "fixing" the issue Luke mentioned is to simply to delete STATIC_ROOT manually, because Django can't really forsee whether the files should stay around or not.

That actually makes me wonder if the simplest solution would be:

  • add a --clear option that would try to delete the content of STATIC_ROOT if the user wishes, with the downside of having to wait much longer for collecstatic to run
  • fix the linking support to actually check if the to be symlinked file doesn't actually have the same name as the link

comment:20 Changed 3 years ago by jezdez

Another item for the TODO list Jacob mentioned on IRC:

  • change the collectstatic warning message to say something like "This will overwrite existing files in /home/jacob/myproj/static - are you sure"? (i.e. actually show STATIC_ROOT there)

Since collectstatic uses the storage backend specified as STATICFILES_STORAGE, STATIC_ROOT might not be used at all. In that case checking if the storage backend is an instance of FileSystemStorage does the trick. IOW, only show STATIC_ROOT if there is a good chance that the storage actually uses it.

comment:21 Changed 3 years ago by jezdez

I've good news, I wasn't able to reproduce the weird behavior of running collect static -l --noinput with a fresh test project, but pinpointed the actual problem to the compressor.finders.CompressorFinder that the OP also used. Since that's also one of the apps I maintain I'd appreciate if the OP would open a ticket there, too.

That said, this doesn't take away the need for the --clear option or the better warning.

comment:22 Changed 3 years ago by jezdez

  • Resolution set to fixed
  • Status changed from new to closed

In [16509]:

Fixed #16161 -- Added --clear option to collectstatic management command to be able to explicitly clear the files stored in the destination storage before collecting.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.