Opened 4 weeks ago
Last modified 2 weeks ago
#36969 assigned Bug
collectstatic post-processing with support_js_module_import_aggregation breaks on certain javascript files
| Reported by: | blighj | Owned by: | blighj |
|---|---|---|---|
| Component: | contrib.staticfiles | Version: | 6.0 |
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
The regular expressions to find import/export statements in javascript can work in common cases but still have plenty of failure modes. A lot of the failures were in comments, which were getting tracked in #21080. But there will always remain others, the complexity space of javascript is more than the regexs can handle. Any code with import like statements in strings will break it for sure.
- https://code.djangoproject.com/ticket/21080#comment:26
- https://code.djangoproject.com/ticket/35371
- https://code.djangoproject.com/ticket/34322#comment:17
To my mind this is a bug, where potentially the only viable solution is a lexer/parser as proposed by Adam in the thrid ticket linked. Which could merit a DEP for a bug!
Or it might just need some sort of update to the docs to articulate Carlton's point in this comment
https://code.djangoproject.com/ticket/32319#comment:21
It's not clear the [regex approach] covers all usages, but it looks OK for common cases.
I think we need to be prepared to say where we're not going to try more complex solutions.
But let's see... 😬
In either case I think it is usefull to have one ticket that tracs all these regex failures in one place, rather than having multiple tickets that are sort of duplicates of this 'regex not working' issue..
There is the package django-manifeststaticfiles-enhanced which provides the lexer solution.
Change History (16)
comment:1 by , 4 weeks ago
| Needs tests: | set |
|---|---|
| Triage Stage: | Unreviewed → Accepted |
comment:2 by , 4 weeks ago
Hello,
Can I take a look at this issue? I'll start right away by adding to the test-suite the failures from #21080. Once the commit for the failing regexes has been provided, I'll see how to integrate the lexer from django-manifeststaticfiles-enhanced, even though discussion is still needed to agree upon adding such complexity to the codebase.
Update: After reading a little bit more the other tickets comments, I think adding a JS lexer would likely be rejected. So I'll limit my work to gather the failures into tests and wait for directions from more senior contributors that know better.
comment:3 by , 4 weeks ago
What Django is trying to do with post_process is equivalent to a linker/JS packer, ie rewriting references/relocating. So yes, putting that in Django source code will be rejected as it can only grow. Why not use esbuild instead of doing it with the regexes? See for example: https://esbuild.github.io/api/#conditions.
comment:4 by , 4 weeks ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
comment:5 by , 4 weeks ago
I'm not sure if you are proposing adding esbuild to a standard pip install django, but in case you are I doubt that will be a runner. As it stands the project is heavily against external python libs, nevermind libs in other languages.
I wouldn't jump in on both feet with a PR for this, there was mention already of requring a DEP for a lexer based approach. So I'd wait until there was at least more discussion on this issue before going to a PR.
comment:6 by , 4 weeks ago
@blighj, You are right. Third party dependencies is a big no as this dependency will not be aligned with compatibility, stability and security concerns of Django. On the other hand, adding a lexer is not a great idea as it will often break and reimplement part of a JS bundler. So the only thing to do on my side is to put new tests that fail in django/tests/staticfiles_tests and move on to another issue to let you, seasoned contributors, discuss the topic further. Thank you for your answer BTW.
comment:7 by , 4 weeks ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
comment:8 by , 4 weeks ago
I added some failing cases to test_storage.py. They fail at the setUp() stage though. https://github.com/v01dXYZ/django/commit/8f2f253d5fe8361c6c0ec95ce33c8881a2cfb6ca
I didn't dare to open a PR as I don't know if it would appreciated to open a PR only to have logs and nothing meant to get merged.
comment:9 by , 4 weeks ago
| Component: | contrib.staticfiles → Documentation |
|---|---|
| Needs tests: | unset |
| Type: | Bug → Cleanup/optimization |
Thanks for helping us collect these in one place.
Or it might just need some sort of update to the docs to articulate Carlton's point in this comment
https://code.djangoproject.com/ticket/32319#comment:21
I think some cases will fall on either side of the line. That's why I'm leaving the tickets you linked open for now, as there could be an incremental, well-scoped fix that would be acceptable?
I'll frame this ticket as a documentation update for now. There, we could mention your package as a place to turn when these edge cases bite.
Then we would either close this ticket as "Fixed", or as "needsnewfeatureprocess" pending this comment:
To my mind this is a bug, where potentially the only viable solution is a lexer/parser as proposed by Adam in the thrid ticket linked. Which could merit a DEP for a bug!
Anyone who wants to gather consensus for a DEP or sound out folks on the new-features repo won't need to wait for the docs tweak.
comment:10 by , 4 weeks ago
| Description: | modified (diff) |
|---|
Removed #32849 as it is not really a case of the regex's failing on supported statements.
comment:11 by , 4 weeks ago
I see no one is assigned to this ticket and would like to take a stab at the documentation update mentioned in comment:9. I'll add a note clarifying that the regex approach covers common cases but has known limitations, and mention django-manifeststaticfiles-enhanced as an alternative for edge cases
comment:12 by , 4 weeks ago
It is unusual to mention third party packages in the docs. I'd recommend against that part of any update.
comment:13 by , 3 weeks ago
| Component: | Documentation → contrib.staticfiles |
|---|---|
| Owner: | set to |
| Status: | new → assigned |
| Type: | Cleanup/optimization → Bug |
I've done some futher exploring, and two of the three cases, (and a lot of the issues I've come across in my own experience), are covered by the import in quotes problems. The following regex would fix that, so potentially we could use that and not have to update the docs at all. What do you thing Jacob?
I've tested the regex's against the benchmark projects I use and it is doing the job. During that testing I did find an issue on the xstate js lib, which wasn't directly caused by the change, but in discoverying a fix for it you get a fix for 35371 too. So we can move this pretty far along with just regexs, way further than I first suspected.
I've moved the ticket back to bug and assigned myself the owner for now. But I'll hold off on a PR until I get your input.
_css_ignored_re = _lazy_re_compile(
r"/\*.*?\*/" # block comment
r"|'(?:[^'\\]|\\.)*'" # single-quoted string
r'|"(?:[^"\\]|\\.)*"', # double-quoted string
re.DOTALL,
)
_js_ignored_re = _lazy_re_compile(
r"/\*.*?\*/" # block comment
r"|//[^\n]*" # line comment
r"|'(?:[^'\\]|\\.)*'" # single-quoted string
r'|"(?:[^"\\]|\\.)*"' # double-quoted string
r"|`(?:[^`\\]|\\.)*`", # template literal
re.DOTALL,
)
comment:14 by , 3 weeks ago
| Triage Stage: | Accepted → Unreviewed |
|---|
comment:15 by , 2 weeks ago
| Triage Stage: | Unreviewed → Accepted |
|---|
Sounds promising, happy to see it! Will you include a link to your benchmarK?
comment:16 by , 2 weeks ago
| Has patch: | set |
|---|
The list of sites in the benchmark is on this issue https://github.com/blighj/django-manifeststaticfiles-enhanced/issues/34, but the actual benchmark code is not published anywhere. What I did was crawl sites from built with django and built with wagtail, find ones that had hashing in asset names and then find their staticfiles manifest and curl all the assets to my local machine. That leaves me with a folder with a bare django project and 50 subfolders of different assets, then the benchmark script changes the STATICFILES_DIRS and STORAGES setting to test the different backends against the different projects timing three runs for each different configuration.
Yeah, I 100% agree with that. I'm going to mark the triage status as accepted just because the answer to the issue shouldn't just be do nothing. I personally won't assign this to myself, but I will suggest to whoever wants to do it to look into merging the functionality from the aforementioned package, django-mainfeststaticfiles-enhanced! Good luck, everyone!