Opened 4 weeks ago

Last modified 2 weeks ago

#36969 assigned Bug

collectstatic post-processing with support_js_module_import_aggregation breaks on certain javascript files

Reported by: blighj Owned by: blighj
Component: contrib.staticfiles Version: 6.0
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by blighj)

The regular expressions to find import/export statements in javascript can work in common cases but still have plenty of failure modes. A lot of the failures were in comments, which were getting tracked in #21080. But there will always remain others, the complexity space of javascript is more than the regexs can handle. Any code with import like statements in strings will break it for sure.

To my mind this is a bug, where potentially the only viable solution is a lexer/parser as proposed by Adam in the thrid ticket linked. Which could merit a DEP for a bug!

Or it might just need some sort of update to the docs to articulate Carlton's point in this comment
https://code.djangoproject.com/ticket/32319#comment:21

It's not clear the [regex approach] covers all usages, but it looks OK for common cases.
I think we need to be prepared to say where we're not going to try more complex solutions.
But let's see... 😬

In either case I think it is usefull to have one ticket that tracs all these regex failures in one place, rather than having multiple tickets that are sort of duplicates of this 'regex not working' issue..

There is the package django-manifeststaticfiles-enhanced which provides the lexer solution.

Change History (16)

comment:1 by Elias Stinson, 4 weeks ago

Needs tests: set
Triage Stage: UnreviewedAccepted

Yeah, I 100% agree with that. I'm going to mark the triage status as accepted just because the answer to the issue shouldn't just be do nothing. I personally won't assign this to myself, but I will suggest to whoever wants to do it to look into merging the functionality from the aforementioned package, django-mainfeststaticfiles-enhanced! Good luck, everyone!

comment:2 by kurtmoll, 4 weeks ago

Hello,

Can I take a look at this issue? I'll start right away by adding to the test-suite the failures from #21080. After reading a little bit more the other tickets comments, I think adding a JS lexer would likely be rejected (too much hassle as it would break if new syntax constructs are added to ECMA). So I'll limit my work to gather the failures into tests and wait for directions from more senior contributors that know better.

Last edited 4 weeks ago by kurtmoll (previous) (diff)

comment:3 by kurtmoll, 4 weeks ago

What Django is trying to do with post_process is equivalent to a linker/JS packer, ie rewriting references/relocating. So yes, putting that in Django source code will be rejected as it can only grow. Why not use esbuild instead of doing it with the regexes? See for example: https://esbuild.github.io/api/#conditions.

comment:4 by kurtmoll, 4 weeks ago

Owner: set to kurtmoll
Status: newassigned

comment:5 by blighj, 4 weeks ago

I'm not sure if you are proposing adding esbuild to a standard pip install django, but in case you are I doubt that will be a runner. As it stands the project is heavily against external python libs, nevermind libs in other languages.

I wouldn't jump in on both feet with a PR for this, there was mention already of requring a DEP for a lexer based approach. So I'd wait until there was at least more discussion on this issue before going to a PR.

comment:6 by kurtmoll, 4 weeks ago

@blighj, You are right. Third party dependencies is a big no as this dependency will not be aligned with compatibility, stability and security concerns of Django. On the other hand, adding a lexer is not a great idea as it will often break and reimplement part of a JS bundler. So the only thing to do on my side is to put new tests that fail in django/tests/staticfiles_tests and move on to another issue to let you, seasoned contributors, discuss the topic further. Thank you for your answer BTW.

comment:7 by kurtmoll, 4 weeks ago

Owner: kurtmoll removed
Status: assignednew

comment:8 by kurtmoll, 4 weeks ago

I added some failing cases to test_storage.py. They fail at the setUp() stage though. https://github.com/v01dXYZ/django/commit/8f2f253d5fe8361c6c0ec95ce33c8881a2cfb6ca

I didn't dare to open a PR as I don't know if it would appreciated to open a PR only to have logs and nothing meant to get merged.

comment:9 by Jacob Walls, 4 weeks ago

Component: contrib.staticfilesDocumentation
Needs tests: unset
Type: BugCleanup/optimization

Thanks for helping us collect these in one place.

Or it might just need some sort of update to the docs to articulate Carlton's point in this comment

https://code.djangoproject.com/ticket/32319#comment:21

I think some cases will fall on either side of the line. That's why I'm leaving the tickets you linked open for now, as there could be an incremental, well-scoped fix that would be acceptable?

I'll frame this ticket as a documentation update for now. There, we could mention your package as a place to turn when these edge cases bite.

Then we would either close this ticket as "Fixed", or as "needsnewfeatureprocess" pending this comment:

To my mind this is a bug, where potentially the only viable solution is a lexer/parser as proposed by Adam in the thrid ticket linked. Which could merit a DEP for a bug!

Anyone who wants to gather consensus for a DEP or sound out folks on the new-features repo won't need to wait for the docs tweak.

comment:10 by blighj, 4 weeks ago

Description: modified (diff)

Removed #32849 as it is not really a case of the regex's failing on supported statements.

comment:11 by Muhammad Muzammil, 4 weeks ago

I see no one is assigned to this ticket and would like to take a stab at the documentation update mentioned in comment:9. I'll add a note clarifying that the regex approach covers common cases but has known limitations, and mention django-manifeststaticfiles-enhanced as an alternative for edge cases

comment:12 by blighj, 3 weeks ago

It is unusual to mention third party packages in the docs. I'd recommend against that part of any update.

comment:13 by blighj, 3 weeks ago

Component: Documentationcontrib.staticfiles
Owner: set to blighj
Status: newassigned
Type: Cleanup/optimizationBug

I've done some futher exploring, and two of the three cases, (and a lot of the issues I've come across in my own experience), are covered by the import in quotes problems. The following regex would fix that, so potentially we could use that and not have to update the docs at all. What do you thing Jacob?

I've tested the regex's against the benchmark projects I use and it is doing the job. During that testing I did find an issue on the xstate js lib, which wasn't directly caused by the change, but in discoverying a fix for it you get a fix for 35371 too. So we can move this pretty far along with just regexs, way further than I first suspected.

I've moved the ticket back to bug and assigned myself the owner for now. But I'll hold off on a PR until I get your input.

_css_ignored_re = _lazy_re_compile(
    r"/\*.*?\*/"  # block comment
    r"|'(?:[^'\\]|\\.)*'"  # single-quoted string
    r'|"(?:[^"\\]|\\.)*"',  # double-quoted string
    re.DOTALL,
)
_js_ignored_re = _lazy_re_compile(
    r"/\*.*?\*/"  # block comment
    r"|//[^\n]*"  # line comment
    r"|'(?:[^'\\]|\\.)*'"  # single-quoted string
    r'|"(?:[^"\\]|\\.)*"'  # double-quoted string
    r"|`(?:[^`\\]|\\.)*`",  # template literal
    re.DOTALL,
)

comment:14 by Jacob Walls, 3 weeks ago

Triage Stage: AcceptedUnreviewed

comment:15 by Jacob Walls, 2 weeks ago

Triage Stage: UnreviewedAccepted

Sounds promising, happy to see it! Will you include a link to your benchmarK?

comment:16 by blighj, 2 weeks ago

Has patch: set

The list of sites in the benchmark is on this issue https://github.com/blighj/django-manifeststaticfiles-enhanced/issues/34, but the actual benchmark code is not published anywhere. What I did was crawl sites from built with django and built with wagtail, find ones that had hashing in asset names and then find their staticfiles manifest and curl all the assets to my local machine. That leaves me with a folder with a bare django project and 50 subfolders of different assets, then the benchmark script changes the STATICFILES_DIRS and STORAGES setting to test the different backends against the different projects timing three runs for each different configuration.

Note: See TracTickets for help on using tickets.
Back to Top