Opened 10 months ago

Closed 9 months ago

#34709 closed Bug (fixed)

charset should be ignored for the application/x-www-form-urlencoded content type.

Reported by: Mariusz Felisiak Owned by: Mariusz Felisiak
Component: HTTP handling Version: 4.2
Severity: Normal Keywords:
Cc: Markus Holtermann, Simon Charette, Adam Johnson, Shamil Abdulaev, Shai Berger Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Mariusz Felisiak)

charset parameter is used in application/x-www-urlencoded content type. However, per the current spec (check out RFC 1866) the content type application/x-www-form-urlencoded does not have a charset and should be treated as UTF-8.

Thanks Eki Xu for the report.

Change History (20)

comment:1 by Mariusz Felisiak, 10 months ago

See related tickets #5076 and #14035.

Last edited 10 months ago by Mariusz Felisiak (previous) (diff)

comment:2 by Mariusz Felisiak, 10 months ago

Description: modified (diff)

comment:3 by Mariusz Felisiak, 10 months ago

The current behavior, even if not correct, is documented and tested. Should we use a deprecation path?

comment:4 by Claude Paroz, 10 months ago

I think it would be difficult to provide a sensible deprecation path (visible by devs, I mean), do you have a plan in mind?

in reply to:  4 ; comment:5 by Mariusz Felisiak, 10 months ago

Replying to Claude Paroz:

I think it would be difficult to provide a sensible deprecation path (visible by devs, I mean), do you have a plan in mind?

We could raise a warning when self._encoding is not utf-8, that it will be ignored in Django 6.0. I'm just not sure it's worth doing.

comment:6 by Shamil Abdulaev, 10 months ago

Cc: Shamil Abdulaev added

Greetings to you! I would like to tackle this bug and solve it!)

comment:7 by Mariusz Felisiak, 10 months ago

Shamil, this ticket is already assign to me. We're discussing an acceptable approach.

in reply to:  5 ; comment:8 by Claude Paroz, 10 months ago

Replying to Mariusz Felisiak:

We could raise a warning when self._encoding is not utf-8, that it will be ignored in Django 6.0. I'm just not sure it's worth doing.

The warning will probably only be ever raised on production servers, so hardly visible in practice.

Did you explore what will happen in practice if an incoming request is encoded in a different encoding and we try to decode it with utf-8? Server error (500)? Bad request(400)?

in reply to:  8 comment:9 by Mariusz Felisiak, 10 months ago

Replying to Claude Paroz:

Did you explore what will happen in practice if an incoming request is encoded in a different encoding and we try to decode it with utf-8? Server error (500)? Bad request(400)?

I was only able to achieve a badly decoded string, no crash 🤔

comment:10 by Claude Paroz, 10 months ago

Indeed, by default parse_sql is calling decode() with errors='replace', which will produce � (U+FFFD, the official REPLACEMENT CHARACTER) for invalid UTF-8 sequences. I'm still not convinced it will be a real improvement over the current situation…

in reply to:  10 ; comment:11 by Mariusz Felisiak, 10 months ago

Replying to Claude Paroz:

Indeed, by default parse_sql is calling decode() with errors='replace', which will produce � (U+FFFD, the official REPLACEMENT CHARACTER) for invalid UTF-8 sequences. I'm still not convinced it will be a real improvement over the current situation…

We could raise 400 instead of silently ignoring a custom charset.

in reply to:  11 comment:12 by Claude Paroz, 10 months ago

Replying to Mariusz Felisiak:

We could raise 400 instead of silently ignoring a custom charset.

Sure, I'd prefer that, failing loudly, instead of silently getting wrongly-encoded input.

comment:13 by Adam Johnson, 9 months ago

A 400 response sounds sensible to me too.

comment:14 by Mariusz Felisiak, 9 months ago

Has patch: set

comment:15 by Claude Paroz, 9 months ago

While starting to review the patch, and looking for more recent considerations than the old 1866 RFC, I read https://url.spec.whatwg.org/#application/x-www-form-urlencoded which is worth a read. Quoting a note:

A legacy server-oriented implementation might have to support encodings other than UTF-8 as well as have special logic for tuples of which the name is _charset. Such logic is not described here as only UTF-8 is conforming.

I don't necessarily re-question our previous discussions/decisions, however we might be prepared to receive some complaints as it may be that non-conforming agents start to produce BadRequest errors. Difficult to say before going to production!

comment:16 by Natalia Bidart, 9 months ago

Triage Stage: AcceptedReady for checkin

in reply to:  15 ; comment:17 by Mariusz Felisiak, 9 months ago

Cc: Shai Berger added

Replying to Claude Paroz:

While starting to review the patch, and looking for more recent considerations than the old 1866 RFC, I read https://url.spec.whatwg.org/#application/x-www-form-urlencoded which is worth a read. Quoting a note:

A legacy server-oriented implementation might have to support encodings other than UTF-8 as well as have special logic for tuples of which the name is _charset. Such logic is not described here as only UTF-8 is conforming.

I don't necessarily re-question our previous discussions/decisions, however we might be prepared to receive some complaints as it may be that non-conforming agents start to produce BadRequest errors. Difficult to say before going to production!

Unfortunately, I don't see a way to support this with a loud crash at the same time.

in reply to:  17 comment:18 by Mariusz Felisiak, 9 months ago

I found some issues/PRs to remove charset for the application/x-www-form-urlencoded content type in other libraries. Even explicitly passing charset=utf-8 caused issues. As far as I'm aware, we can move it forward:

comment:19 by Claude Paroz, 9 months ago

Sure, let's go ahead!

comment:20 by GitHub <noreply@…>, 9 months ago

Resolution: fixed
Status: assignedclosed

In 11920e7:

Fixed #34709 -- Raised BadRequest for non-UTF-8 requests with the application/x-www-form-urlencoded content type.

Thanks Eki Xu for the report.

Note: See TracTickets for help on using tickets.
Back to Top