Opened 3 weeks ago
Closed 2 weeks ago
#36737 closed Bug (fixed)
Escape C1 control sequence in `escapejs`
| Reported by: | Thibaut Decombe | Owned by: | farthestmage |
|---|---|---|---|
| Component: | Template system | Version: | 5.2 |
| Severity: | Normal | Keywords: | |
| Cc: | Thibaut Decombe | Triage Stage: | Ready for checkin |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | yes | UI/UX: | no |
Description
The current implementation of the escapejshttps://github.com/django/django/blob/5c60763561c67924eff1069e1516b60a59d068d5/django/utils/html.py#L79-L80 escapes only C0 control characters (unicode values ranging from 0 to 31)
However, there are other control characters in the 127-159 range, the C1 control characters.
See https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C1_controls
Should we escape these too ?
The rust helper char.is_control https://doc.rust-lang.org/src/core/char/methods.rs.html#952 consider both these ranges and we were considering using it in django_rusty_templates
I'll be happy to provide a PR if it make sense
Change History (8)
comment:1 by , 3 weeks ago
comment:2 by , 3 weeks ago
Hey I am also new to contributing the thought process it seems that a simple loop can save the job
comment:3 by , 3 weeks ago
Hey,
I was not assigned this task but majorly I believe I have solved the issue and issued a pull request on github could pls review the code
thanks
comment:4 by , 3 weeks ago
Hello,
A doubt were you able to produce or implement a way to test it out
comment:5 by , 3 weeks ago
| Needs tests: | set |
|---|---|
| Owner: | set to |
| Patch needs improvement: | set |
| Status: | new → assigned |
| Triage Stage: | Unreviewed → Accepted |
| Type: | Cleanup/optimization → Bug |
Okay, I think this is right.
The HTML parsing standard describes the parsing error control-character-in-input-stream like this:
This error occurs if the input stream contains a control code point that is not ASCII whitespace or U+0000 NULL. Such code points are parsed as-is and usually, where parsing rules don't apply any additional restrictions, make their way into the DOM.
control is defined here:
A control is a C0 control or a code point in the range U+007F DELETE to U+009F APPLICATION PROGRAM COMMAND, inclusive.
The C1 control range is U+0080 – U+009F, so the additional characters that need escaping are C1 control characters plus U+007F DELETE (which is apparently sometimes grouped with C0 controls).
See also HTML spec
PR doesn't escape U+007F DELETE and needs tests.
comment:6 by , 3 weeks ago
Hey,
While creating Test cases I run runtests.py it works perfectly but while review fails returns this error AssertionError: '\\u009F' != '\x9f'.I have manipulated \u009f with taking it in a literal sense different error arise while running runtests.py while the current commit creates error only while reviewing not while executing runtests.py
help would be appreciated
thanks
PS: Just a new contributor learning
comment:7 by , 2 weeks ago
| Needs tests: | unset |
|---|---|
| Patch needs improvement: | unset |
| Triage Stage: | Accepted → Ready for checkin |
While I am new to contributing to Django, it looks like a straightforward job
the proposal does seem legit to me
escapejs already escapes C0 control characters, extending it to support C1 range would be better for consistency across control chars. in diff. languages
if there is no backward compatibility issue
and the ticket gets accepted I'd love to chip in