#32191 closed Bug (fixed)
Not RFC 6265 compliant cookies in contrib.messages.
Reported by: | Nico Giefing | Owned by: | Craig Smith |
---|---|---|---|
Component: | contrib.messages | Version: | 3.1 |
Severity: | Normal | Keywords: | Cookie malformed |
Cc: | Collin Anderson, Florian Apolloner | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Hi
A Customer of mine is using a WAF which is handling Cookies as it is described tin the RFC: https://tools.ietf.org/html/rfc6265
The issue now is that Django is trying to use an escape-character in cookie-Values which is not supported in the RFC
an example of such a cookie:
messages=\"123\\\"NOTRECEIVED\""
Please consider to get this fixed so there can be a protection of this system.
Regards,
Nico
Change History (32)
comment:1 by , 4 years ago
Description: | modified (diff) |
---|
comment:2 by , 4 years ago
Cc: | added |
---|---|
Component: | Forms → HTTP handling |
Resolution: | → needsinfo |
Status: | new → closed |
comment:3 by , 4 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
As described in the RFC, in a cookie value a backslash "\" is not to be used.
The issue is that a " will be seen as the end of the cookie value on every device in the middle (WAF) and therefore not the full cookie value will get through to the webserver.
Please let me know if you need more guidance.
comment:4 by , 4 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Django uses Python's http.cookies.SimpleCookie class, and as far as I'm aware we don't escape any characters in Django. It looks that it has already been reported in Python bug tracker, see https://bugs.python.org/issue19670.
I don't think that we should switch to BaseCookie
. You can start a discussion on DevelopersMailingList if you don't agree.
comment:5 by , 4 years ago
Resolution: | wontfix |
---|---|
Status: | closed → new |
Given that you CC'ed me I am going to reopen this :) The spec is rather clear on what is allowed and we should try (when we generate cookies) to comply. This should at least apply to cookies Django core/contrib apps (d.c.messages) generates. I am not sure if we want to put encoding handling into our general cookie handling; I'd probably leave that to the enduser if they want to store weird data there :D
comment:6 by , 4 years ago
An initial approach can look like this:
diff --git a/django/contrib/messages/storage/cookie.py b/django/contrib/messages/storage/cookie.py index b51e292aa0..c75ed75940 100644 --- a/django/contrib/messages/storage/cookie.py +++ b/django/contrib/messages/storage/cookie.py @@ -1,5 +1,7 @@ import json +from urllib.parse import quote, unquote + from django.conf import settings from django.contrib.messages.storage.base import BaseStorage, Message from django.core import signing @@ -8,6 +10,9 @@ from django.utils.crypto import constant_time_compare, salted_hmac from django.utils.safestring import SafeData, mark_safe +SAFE_COOKIE_CHARS = "!#$%&'()*+/:<=>?@[]^`{|}" + + class MessageEncoder(json.JSONEncoder): """ Compactly serialize instances of the ``Message`` class as JSON. @@ -24,6 +29,9 @@ class MessageEncoder(json.JSONEncoder): return message return super().default(obj) + def encode(self, obj): + return quote(super().encode(obj), safe=SAFE_COOKIE_CHARS) + class MessageDecoder(json.JSONDecoder): """ @@ -43,7 +51,7 @@ class MessageDecoder(json.JSONDecoder): return obj def decode(self, s, **kwargs): - decoded = super().decode(s, **kwargs) + decoded = super().decode(unquote(s), **kwargs) return self.process_messages(decoded) diff --git a/tests/messages_tests/test_cookie.py b/tests/messages_tests/test_cookie.py index 5d5fb42d67..af7f36f8d0 100644 --- a/tests/messages_tests/test_cookie.py +++ b/tests/messages_tests/test_cookie.py @@ -182,3 +182,12 @@ class CookieTests(BaseTests, SimpleTestCase): self.assertEqual(decoded, messages) storage_default = self.get_storage() self.assertNotEqual(encoded, storage_default._encode(messages)) + + def test_rfc_6265_encoding(self): + storage = self.storage_class(self.get_request()) + msg_1 = Message(constants.INFO, 'Test me') + msg_2 = Message(constants.INFO, 'Test me \\again') + example_messages = [msg_1, msg_2] + encoded = storage._encode(example_messages) + expected = '[[%22__json_message%22%2C0%2C20%2C%22Test%20me%22]%2C[%22__json_message%22%2C0%2C20%2C%22Test%20me%20%5C%5Cagain%22]]' + self.assertEqual(encoded.split(':')[0], expected)
That said, this makes messages quite a bit longer :/
comment:7 by , 4 years ago
Component: | HTTP handling → contrib.messages |
---|---|
Summary: | Not RFC Compliant Cookie handling → Not RFC 6265 compliant cookies in contrib.messages. |
Triage Stage: | Unreviewed → Accepted |
Agreed, we should fix cookies values from contrib.messages
. I misunderstood the ticket description. Thanks Florian.
comment:8 by , 4 years ago
Well technically it applies to the whole request.cookies
interface; but I don't see any feasible fix there. It would a) be highly backwards incompatible and b) cookies might require coordination with the frontend JS -- so it is better if the user can decide what to encode and how. I mean in the best case one doesn't store much in cookies either way, they do not perform well with concurrent requests for instance…
comment:9 by , 4 years ago
Hi, I would like to work on this ticket. I'm happy to get started on it this week, and would appreciate guidance on what tests we can do for backwards compatibility and the increase in message size (especially given the 4KB size limit for header fields set by Django).
comment:10 by , 4 years ago
Hi Craig, as for the message size I think it would be great if we could do something similar to signing.dumps (ie run zlib over the data). This will probably require us to add support for that in the Signer class in the first place; but it might be a nice addition.
comment:11 by , 4 years ago
Hi Florian, that's a great idea. Doing so also makes sure that the characters used in the cookie are acceptable for RFC 6265. I'll start work on it today.
comment:12 by , 4 years ago
I have implemented the initial approach according to the comment above. One thing to note is that when the cookie is accessed through the value attribute, it will need an unquote
, for cases like req.cookies['messages'].value
.
This could be attempted with a property to manage access to the value
attribute to apply the unquote in there, so that any client code that uses that method of access can remain unchanged.
The compression has proved to be a bit tricky, due to changing between bytes and strings, but the approach that I am taking is to factor out the compression code from signing.dumps
and loads
into signing.compress
and decompress
.
It was encouraging to see swapping the quote
unquote
for singing.dumps
signing.loads
works out of the box, if we want all our message cookies timestamped, which I assume that we don't. But anyway, that works. I'll be back on the case later this week. Thanks for your tips, PR forthcoming.
comment:13 by , 4 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:15 by , 4 years ago
Hi All,
I figure I should probably give some historical context about cookie quoting for the record. It mostly matches the analysis so far on the ticket:
When I refactored request.COOKIE
(parse_cookie()
) back in 2016, I was surprised that the cookie RFCs don't accurately describe how browsers work. I had to test browers myself to figure stuff out. The RFCs gave no guidance as to how to encode cookies, so there's really no official standard for how to encode/decode, and in that respect leaving it up to the application itself to do the encoding could make sense from a Django perspective. I did also notice that using % url quoting was what a lot of frameworks were using for encoding, so that is becoming more and more of an ad-hoc standard, despite not being in the RFC. (Percent encoding makes it really easy to encode/decode in Javascript.).
In 2016 I decided not to touch this encoding issue because I wanted to minimize backwards-compatible issues, so I kept using Python's http.cookies._unquote()
(slash encoding) for backwards compatibility. (Again, browsers don't automatically unquote anything in cookies, so it's not a browser-compatibility thing)
I think long-term it would be great to stop encoding cookies using http.cookies. _quote()
, as I think it's really just a python-specific way of quoting cookies, and as this issue states, it ends up creating invalid cookies. (invalid according the RFC at least. Browsers don't care and they work fine with many of what the RFC considers invalid characters.)
Part of me thinks it would be nice if Django continued to handle cookie encoding automatically and switch to % encoding, but I think that's going to lead to double encoding/decoding in some cases with the current api. Maybe we could have a response.set_cookie('name', 'value', encode=True/False) to control whether Django applies % encoding or not? We'd probably also need to have request.cookies.get('name', decode=True/False).
Anyway, I think the safest and most minimal thing to do is to modify the messages framework to % url quote cookies as PR 13690 suggests. I don't have a clear picture for a more general framework-wide fix, but I at least wanted to write my story down somewhere in case it's helpful.
comment:16 by , 4 years ago
Hi Colin, thanks for your story. It's good to know more of the background. Adding a boolean flag is sensible even in the Messages
module, since if we change the encoding it would interfere with any existing client code that uses the stored message value. I guess the default argument should be encode=False
.
comment:17 by , 4 years ago
Anyway, I think the safest and most minimal thing to do is to modify the messages framework to % url quote cookies as PR 13690 suggests. I don't have a clear picture for a more general framework-wide fix, but I at least wanted to write my story down somewhere in case it's helpful.
I kinda agree, but encoding like I suggested introduces quite an overhead. In that sense I think it would make sense to (if we are changing it anyways) ensure that we can generate even shorter messages. This is the reason why I suggested to move compress
into the signer base classes. This way the messaging framework could benefit from shorter messages while automatically ensuring compatibility with the RFC (the created base64 only contains valid chars).
follow-up: 19 comment:18 by , 4 years ago
Hi All, I've updated the PR with the compress_b64encode
and b64decode_decompress
methods extracted into the django.core.signing
module. I could use some help with the names, should they be compress
and decompress
or mention the b64 encoding as in the current PR?
Also, the encoding from string to bytestring and back again needs some thinking. If the compress/decompress methods are going to be used by client code (as indeed they might be to extract raw messages from req.cookies['messages'].value
) then getting decompress
to return a string instead of a bytestring would be better. And compress
should be modified to accept a string. That will probably be my next step when I get back to this during the week.
comment:19 by , 4 years ago
Replying to Craig Smith:
Hi All, I've updated the PR with the
compress_b64encode
andb64decode_decompress
methods extracted into thedjango.core.signing
module. I could use some help with the names, should they becompress
anddecompress
or mention the b64 encoding as in the current PR?
Personally I would have made those a feature of the signer class itself. This way we do not have to replicate the code everywhere.
Also, the encoding from string to bytestring and back again needs some thinking. If the compress/decompress methods are going to be used by client code (as indeed they might be to extract raw messages from
req.cookies['messages'].value
) then gettingdecompress
to return a string instead of a bytestring would be better. Andcompress
should be modified to accept a string. That will probably be my next step when I get back to this during the week.
As a general rule we should use bytes only on interfaces where it is needed and stay with strings as long as possible I think.
comment:20 by , 4 years ago
Thanks for clarifying those points Florian, I will make some adjustments.
comment:21 by , 4 years ago
Hi All, I'm having a terrible time of it due to encoding back and forth, and the fact that different encodings are used from place to place - utf-8, latin-1 and ascii (in session file storage). It feels like if I take a break from it and start over next week I could get somewhere - in the meantime, I am reading up on Unicode... If anyone else would like to jump in please do, if not, I'll get back into it next week.
The original suggestion to use quote
unquote
works and can be found in this commit. I might submit a separate PR just for that one, in case we can go with that.
comment:22 by , 4 years ago
Hi All,
I have started over. This the new PR.
Still some more work to do on it though.
I have opted to use latin-1 to encode internal to the new functions as we use latin-1 elsewhere. I was surprised when using utf-8 that a character was unrecognised when calling decompress_b64(request.cookies['messages'].value)
in the messages_tests.test_mixins.test_set_messages_success
test. But using latin-1 works. UTF-8 would be better, IMHO because we could store messages in many more languages, but those changes would need to be a bit more widespread and possibly have a bigger impact. But as the latin-1 character set is contained in utf-8, all we'd need to do is change our latin-1 encodings to utf-8 - is that something we should do here, or maybe bring it up on the mailing list?
Going forward, I will add tests, in particular to confirm RFC6265 compliant message cookies, and I will attempt integrating the new functions as methods of the signer base class.
Another question: signing.dumps
currently takes a compress=False
keyword arg, should this be passed through to the sign
method? Or should the sign
method compress and base64 encode by default?
Thanks for reading.
comment:23 by , 4 years ago
Hi Craig, I've looked through it and thought a bit about it:
I have opted to use latin-1 to encode internal to the new functions as we use latin-1 elsewhere. I was surprised when using utf-8 that a character was unrecognised
That sounds like a bug to fix. latin-1 encoding will simply not work in the general case (Try calling compress_b64
with u'€'). We run messages through MessageEncoder
which will result in a JSON string which is always encodable to utf-8
(especially since ensure_ascii
is true).
Going forward, I will add tests, in particular to confirm RFC6265 compliant message cookies
We also will need tests and a backwards compatibility for old existing messages.
and I will attempt integrating the new functions as methods of the signer base class.
I'd hold off on that now till we figured out the str/bytes/encoding issues -- we might have fond another hornet nest here :)
comment:24 by , 4 years ago
Hi Florian, thanks for your feedback. I see what you mean about the Euro symbol and unicode characters in general. To get around this I added a charset
parameter to the compress/decompress
methods. So we can use utf-8 by default, but latin-1 where required.
We need to use latin-1 in the signing.dumps/loads
functions, since the signing.JSONSerializer
uses it. Swapping it out leads us into that hornet's nest.
The two tests that require latin-1 are messages_tests.test_mixins.test_set_messages_success
and messages_tests.test_cookie.test_cookie_settings
. This happens because of one or two calls to storage.add
and storage.update
. As an experiment I replaced all occurrences of latin-1 and iso-8859-1 with utf-8 to see if those tests would pass - they wouldn't. So it's from a dependency outside of django, most likely to do with WSGI, which uses latin-1 somehow. The bytes in particular are 0x91 and 0x92, which are left and right single quote characters in latin-1. Maybe we should open another ticket about removing the iso-8859-1 charset. Or at least pushing it back further toward the dependency boundary.
I will get back onto this in a few days and come up with those extra tests. And then probably need to add documentation about the new methods.
comment:25 by , 4 years ago
Has patch: | set |
---|---|
Needs documentation: | set |
I have added a few tests that cover the new functions. If someone can review it and confirm that no further changes are needed, after that I will add some documentation of the new functions in the signing
module. For reference it is this PR
comment:26 by , 4 years ago
Hi Craig,
I've looked through your pull request and I finally understand why you needed the bunch of encode/decode stuff everywhere. Personally as a result of that I think the code is now rather ugly (certainly not your fault, but rather ours since we have a mix of bytes and strings everywhere here). I have an idea that I want to try out to directly use signing.loads/dumps in the message cookie backend; which might make this all (a bit?) easier.
I'll try to put something up as a draft PR and maybe we can find a nice way out of this mess :)
comment:27 by , 4 years ago
I started a discussion on the dev mailing list on how to get signing usage under control: https://groups.google.com/g/django-developers/c/X4N_71xeATM -- I think it would be good if we find some consensus there and then build upon that.
comment:28 by , 4 years ago
Needs documentation: | unset |
---|
comment:30 by , 4 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Django uses Python's
http.cookies._unquote()
to better match the behavior of browsers, see #26158.is parsed to
{'messages': '123"NOTRECEIVED"'}
which seems correct to me. What value do you expect?