Opened 11 years ago

Last modified 15 months ago

#21080 new Bug

collectstatic post-processing fails for references inside comments

Reported by: shreyas@… Owned by:
Component: contrib.staticfiles Version: dev
Severity: Normal Keywords:
Cc: Joshua Smith, Manel Clos, Petr Přikryl, powderflask Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

"python manage.py collectstatic" is attempting to parse references inside css comments and generating errors during post-processing. I am using:
STATICFILES_STORAGE = 'django.contrib.staticfiles.storage.CachedStaticFilesStorage'

eg.
The following snippet of code in test.css:

.gfg-collapse-closed {
  /* background-image : url('arrow_close.gif'); */
}

produces the following error:

ValueError: The file 'stylesheets/arrow_close.gif' could not be found with <django.contrib.staticfiles.storage.CachedStaticFilesStorage object at 0x1078a3910>.
collectstatic

Ideally, collectstatic should respect CSS comments and should not attempt to parse/reference files in lines that are commented out.

If the fix is too complex, a simple workaround might be to include a --ignore-error flag that would allow the application to continue post-processing even when it sees errors

Attachments (1)

21080-pyparsing.patch (3.3 KB ) - added by Claude Paroz 22 months ago.

Download all attachments as: .zip

Change History (32)

comment:1 by Kevin Christopher Henry, 11 years ago

Triage Stage: UnreviewedAccepted

CachedFilesMixin is using simple regular expressions to match url and @import statements, and so is picking up the ones inside comments. It would also have a problem if the text was inside a string.

If it's possible to rewrite the regex to cover all the corner cases, that would be the easy fix. But I doubt it. The robust way would be to parse the css files properly, but that would mean rewriting a lot of code...

comment:2 by anonymous, 11 years ago

Could you recommend a path forward? Can we at least have it skip failed imports with a warning rather than break?

-Shreyas

comment:3 by Pablo SEMINARIO, 11 years ago

I'm agree with the comment of @marfire, the correct way to do this is with a CSS parser, because there are many cases to be treated (multi-line comments, nested comments, broken comments, etc.)

Adding an option to skip failed imports was reported in ticket:19650 and it was marked as a duplicated of ticket:18958.

If you really need ignore these errors, you can create a subclass of CachedFilesMixin on your own project and add a try...except statement to handle these kind of exceptions.

comment:4 by German Larrain, 11 years ago

This bit me today. "Dumb" workaround: replace that url in your comments with u r l.

comment:5 by Ben Spaulding, 11 years ago

Just as an additional datapoint and another thing to consider if it comes to parsing the CSS, I recently started working on a project where the designers used some CSS expressions for an IE-specific stylesheet. The JScript used in one expression puts a particularly crazy bit like .replace('url("','').replace('")',''). As marfire said, that url bit in a string raises an exception. Fortunately I can overcome this for my project with a simple sublcass to make a very small change to the regular expression used. But that is one other thing to consider if it is decided that the CachedStaticFilesStorage needs to be more robust.

comment:6 by Udo Schochtert, 7 years ago

hi, apparently this issue is still open in django 1.11.

my deploy to heroku failed with the following error log:

...
    $ python manage.py collectstatic --noinput
       Post-processing 'jquery-ui-dist/jquery-ui.theme.css' failed!
       Traceback (most recent call last):
         File "manage.py", line 10, in <module>
           execute_from_command_line(sys.argv)
         File "/app/.heroku/python/lib/python3.6/site-packages/django/core/management/__init__.py", line 364, in execute_from_command_line
           utility.execute()
         File "/app/.heroku/python/lib/python3.6/site-packages/django/core/management/__init__.py", line 356, in execute
           self.fetch_command(subcommand).run_from_argv(self.argv)
         File "/app/.heroku/python/lib/python3.6/site-packages/django/core/management/base.py", line 283, in run_from_argv
           self.execute(*args, **cmd_options)
         File "/app/.heroku/python/lib/python3.6/site-packages/django/core/management/base.py", line 330, in execute
           output = self.handle(*args, **options)
         File "/app/.heroku/python/lib/python3.6/site-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 199, in handle
           collected = self.collect()
         File "/app/.heroku/python/lib/python3.6/site-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 145, in collect
           raise processed
       whitenoise.storage.MissingFileError: The file 'jquery-ui-dist/"images/ui-icons_555555_256x240.png"' could not be found with <whitenoise.storage.CompressedManifestStaticFilesStorage object at 0x7ff880f551d0>.
       The CSS file 'jquery-ui-dist/jquery-ui.theme.css' references a file which could not be found:
         jquery-ui-dist/"images/ui-icons_555555_256x240.png"
       Please check the URL references in this CSS file, particularly any
       relative paths which might be pointing to the wrong location.
 !     Error while running '$ python manage.py collectstatic --noinput'.
       See traceback above for details.
       You may need to update application code to resolve this error.
       Or, you can disable collectstatic for this application:
          $ heroku config:set DISABLE_COLLECTSTATIC=1
       https://devcenter.heroku.com/articles/django-assets
 !     Push rejected, failed to compile Python app.
 !     Push failed

the "collectstatic" command fails here:
... The file 'jquery-ui-dist/"images/ui-icons_555555_256x240.png"' could not be found with ...
because it tries to parse js comments.

The file path should be

  • jquery-ui-dist/images/ui-icons_555555_256x240.png

and not

  • jquery-ui-dist/"images/ui-icons_555555_256x240.png"

other people on stackoverflow run inmto this issue as well:

would be great if this issue could be fixed. thanks ;)

comment:7 by Udo Schochtert, 7 years ago

Version: 1.51.11

comment:8 by Joshua Smith, 7 years ago

Cc: Joshua Smith added
Version: 1.112.0

comment:9 by Joshua Smith, 7 years ago

Version: 2.0master

comment:10 by powderflask, 7 years ago

Re: comment:6 is a slightly different issue and MUCH easier to patch.

This issue is caused by generators (like http://jqueryui.com/themeroller) that urlencode stuff in a comment block.
E.g.

/*! jQuery UI - v1.12.1 - 2016-09-14
  * http://jqueryui.com
  * Includes: core.css, accordion.css, ...
  * To view and modify this theme, visit http://jqueryui.com/themeroller/?...&iconsHover=url(%22images%2Fui-icons_555555_256x240.png%22)...
  * Copyright jQuery Foundation and other contributors; Licensed MIT */

The regular expression in storage.HashedFilesMixin matches url(...) in the comment, but it does NOT unquote the urlencoded quotation mark: %22
The last step of converter() unquotes the transformed url: return template % unquote(transformed_url), converting the urlencoded quotation marks back, and thus messing up the filename, as can be seen in the posted stack trace:

    The file 'jquery-ui-dist/"images/ui-icons_555555_256x240.png"' could not be found...

(Notice the excess quotation marks in the path.)

Patch:

storage.py line 61
- r"""(url\(['"]{0,1}\s*(.*?)["']{0,1}\))""",
+ r"""(url\((?:['"]|%22|%27){0,1}\s*(.*?)(?:['"]|%22|%27){0,1}\))""",

This matches url encoded single- or double-quotes and fixed several such issues with 3rd party packages that included generated CSS files like this.

IF a general fix for ignoring comments in CSS files is implemented, that would resolve this issue too.
But IF NOT, this is a simple fix that prevents collectstatic from crashing on common 3rd party packages.

Happy to submit a patch for this if this seems worthwhile.


comment:11 by Claude Paroz, 7 years ago

As long as we are using regular expressions, excluding comment contents seems out of question. So your patch might be worthfile in the short term.
A proper resolution of this ticket would be to use a JS parser to get links. This will be much more invasive though.

in reply to:  10 comment:12 by powderflask, 7 years ago

Replying to powderflask:
If you need to resolve this issue in the interim, override ManifestStaticFilesStorage (or CachedStaticFilesStorage) and re-define the class patterns property:

from django.contrib.staticfiles import storage

class PatchedManifestStaticFilesStorage(storage.ManifestStaticFilesStorage):
    """
    Override the replacement patterns to match URL-encoded quotations.
    """
    patterns = (
        ("*.css", (
            r"""(url\((?:['"]|%22|%27){0,1}\s*(.*?)(?:['"]|%22|%27){0,1}\))""",
            (r"""(@import\s*["']\s*(.*?)["'])""", """@import url("%s")"""),
        )),
    )

comment:13 by Nathan Gaberel, 6 years ago

Owner: changed from nobody to Nathan Gaberel
Status: newassigned

comment:14 by Nathan Gaberel, 6 years ago

Has patch: set

PR

I put something together that detects block comments in css (using a regexp) and ignores those urls that overlap with a comment.
It has the benefit of not requiring additional dependencies to parse css.

I think it's ready but I'm probably missing some test cases, feedback welcome!

comment:15 by Mariusz Felisiak, 6 years ago

Patch needs improvement: set

comment:16 by Wolph, 5 years ago

Just in case anyone is looking for a _really_ simple hotfix, you can simply strip the quotes from the name in hashed_name().

Specifically, somewhere at the top of this method: https://github.com/django/django/blob/efc3e32d6d7fb9bb41be73b80c8607b653c1fbd6/django/contrib/staticfiles/storage.py#L79-L111

Or... put this monkeypatch somewhere in your code (settings file for example):

from django.contrib.staticfiles import storage
import functools

original_hashed_name = storage.HashedFilesMixin.hashed_name

@functools.wraps(original_hashed_name)
def hashed_name(self, name, *args, **kwargs):
    return original_hashed_name(self, name.strip('"'), *args, **kwargs)

storage.HashedFilesMixin.hashed_name = hashed_name

comment:17 by Jacob Walls, 4 years ago

Owner: changed from Nathan Gaberel to Tomáš Zigo
Patch needs improvement: unset

Resetting flags for more recent, unreviewed PR.

PR

comment:18 by Mariusz Felisiak, 4 years ago

Needs tests: set
Patch needs improvement: set

comment:19 by Markus Bertheau, 4 years ago

FWIW, the current approach of parsing CSS with regexes also doesn't work with SVG data inlined as data: URLs when they reference for example <linearGradient id='someid' ...>..</..> <use fill='url(%23someid)' ...></use>. This is changed to something like fill='url("#someid#someid")'.

Two shortcomings here are:

  • double quotes are inserted that weren't there before, breaking the quoting of the data:-URL that the svg data is in
  • the doubling of the id fragment.

Concerning the second problem: I didn't look too far into it, but that seems to happen because the font-face branch is triggered https://github.com/django/django/blob/96f55ccf798c7592a1203f798a4dffaf173a9263/django/contrib/staticfiles/storage.py#L131-L142

It's also at least curious that after line 131 final_url already has any fragment hash marks escaped (# -> %23), so urlsplit in line 137 doesn't catch it as a fragment.

The relevant extract of our CSS looks like this:

.sc_modal .content.branded-bg {

background-image: url("data:image/svg+xml,%3csvg width='375' [...] %3cuse fill='url(%23c)' [...]");

}

Version 0, edited 4 years ago by Markus Bertheau (next)

comment:20 by GitHub <noreply@…>, 22 months ago

In bae053d4:

Refs #21080, Refs #34322 -- Added warning to ManifestStaticFilesStorage docs about paths in comments.

comment:21 by Mariusz Felisiak <felisiak.mariusz@…>, 22 months ago

In e1c74bf4:

[4.2.x] Refs #21080, Refs #34322 -- Added warning to ManifestStaticFilesStorage docs about paths in comments.

Backport of bae053d497ba8a8de7e4f725973924bfb1885fd2 from main.

comment:22 by Mariusz Felisiak <felisiak.mariusz@…>, 22 months ago

In 60be590:

[4.1.x] Refs #21080, Refs #34322 -- Added warning to ManifestStaticFilesStorage docs about paths in comments.

Backport of bae053d497ba8a8de7e4f725973924bfb1885fd2 from main.

comment:23 by Mariusz Felisiak, 22 months ago

#34322 was a duplicate for ES module imports.

comment:24 by Claude Paroz, 22 months ago

If someone has time to make a little study on how we could use a real parser (e.g. lark or pyparsing) to strip comments from CSS and JS, would be awesome! I'm sure a general parser could be used in other parts of Django as well.

comment:25 by Claude Paroz, 22 months ago

I made a very quick and dirty experiment inside HashedFilesMixin._post_process using the pyparsing capability to remove comments from files (pyparsing.cpp_style_comment.suppress()). Running the Django staticfiles tests with or without that comments removal showed a performance penalty of ~330% (8.6 secs instead of 2.6 secs). A first optimization to only strip comments when an error occurs reduces the penalty to ~170% (strongly dependent on files at hand, of course).

It is expected that using a parser instead of regexes will affect performance more or less badly. What performance penalty are we ready to pay for fixing this?

by Claude Paroz, 22 months ago

Attachment: 21080-pyparsing.patch added

comment:26 by Sebastiaan Zeeff, 22 months ago

I noticed that the behavior got worse in Django 4.2.0b1 compared to pre-4.2 versions.

Let's say you have a javascript file with these contents (minimized exampled):

function return_text(text_one, text_two) {
  return `${text_one} ${text_two}`;
}


class MyClass {
  render(){
    return return_text("import App from './App.vue'", 'import { Component } from "@org/components/loader";');
  }
}

The "import" statements here are part of strings, not actually executable JavaScript code. In the original snippet, they're part of a component that shows information on how to use the component library and, as such, contains strings with imports as part of the documentation.

In Django 3.2.x, 4.0.x, 4.1.x, running collectstatic works fine.

However, in the new beta, you get an error:

(django-42-collectstatic-bug) D:\playground\django-42-collectstatic-bug>python manage.py collectstatic --clear --no-input -v 0
Post-processing 'example.js' failed!

Traceback (most recent call last):
  File "D:\playground\django-42-collectstatic-bug\manage.py", line 22, in <module>
    main()
  File "D:\playground\django-42-collectstatic-bug\manage.py", line 18, in main
    execute_from_command_line(sys.argv)
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\core\management\__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\core\management\__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\core\management\base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\core\management\base.py", line 458, in execute
    output = self.handle(*args, **options)
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\contrib\staticfiles\management\commands\collectstatic.py", line 209, in handle
    collected = self.collect()
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\contrib\staticfiles\management\commands\collectstatic.py", line 154, in collect
    raise processed
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\contrib\staticfiles\storage.py", line 364, in _post_process
    content = pattern.sub(converter, content)
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\contrib\staticfiles\storage.py", line 241, in converter
    hashed_url = self._url(
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\contrib\staticfiles\storage.py", line 174, in _url
    hashed_name = hashed_name_func(*args)
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\contrib\staticfiles\storage.py", line 414, in _stored_name
    cache_name = self.clean_name(self.hashed_name(name))
  File "D:\virtualenvs\django-42-collectstatic-bug\lib\site-packages\django\contrib\staticfiles\storage.py", line 135, in hashed_name
    raise ValueError(
ValueError: The file 'App.vue'", 'import { Component } from "@org/components/loader' could not be found with <django.contrib.staticfiles.storage.ManifestStaticFilesStorage object at 0x00000229D19679D0>.

Unfortunately, this a third-party dependency that we don't control. This means that we'd have to disable the manifest-backed static file storage completely in order to continue using this dependency (that we can't really get rid of), unless I've overlooked an option to disable import errors for specific files.

comment:27 by Jake Bell, 21 months ago

I'd also like to note that this bug also errors out when processing the CSS from Bootstrap 5, as it gets hung up on the SVG data URIs:

ValueError: The file '\"data:image/svg+xml,<svg xmlns='http:/www.w3.org/2000/svg' viewBox='0 0 16 16'><path fill='none' stroke='' could not be found with <django.contrib.staticfiles.storage.ManifestStaticFilesStorage object at 0x106955d00>.

This seems like a pretty significant issue to me, given the popularity of Bootstrap. Simply parsing out the comments wouldn't be enough to fix the above issue. It seems like what is needed is either a proper CSS parser (like in the linked thread about JS files--perhaps https://tinycss.readthedocs.io/?) or a better regex that can exclude url(data:) items.

comment:28 by Mariusz Felisiak, 19 months ago

Owner: Tomáš Zigo removed
Status: assignednew

comment:29 by Manel Clos, 19 months ago

Cc: Manel Clos added

comment:30 by Petr Přikryl, 18 months ago

Cc: Petr Přikryl added

comment:31 by powderflask, 15 months ago

Cc: powderflask added
Note: See TracTickets for help on using tickets.
Back to Top