Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#32191 closed Bug (fixed)

Not RFC 6265 compliant cookies in contrib.messages.

Reported by: Nico Giefing Owned by: Craig Smith
Component: contrib.messages Version: 3.1
Severity: Normal Keywords: Cookie malformed
Cc: Collin Anderson, Florian Apolloner Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Nico Giefing)

Hi
A Customer of mine is using a WAF which is handling Cookies as it is described tin the RFC: https://tools.ietf.org/html/rfc6265

The issue now is that Django is trying to use an escape-character in cookie-Values which is not supported in the RFC

an example of such a cookie:

messages=\"123\\\"NOTRECEIVED\""

Please consider to get this fixed so there can be a protection of this system.

Regards,

Nico

Change History (32)

comment:1 by Nico Giefing, 4 years ago

Description: modified (diff)

comment:2 by Mariusz Felisiak, 4 years ago

Cc: Collin Anderson Florian Apolloner added
Component: FormsHTTP handling
Resolution: needsinfo
Status: newclosed

Django uses Python's http.cookies._unquote() to better match the behavior of browsers, see #26158.

messages=\"123\\\"NOTRECEIVED\""

is parsed to {'messages': '123"NOTRECEIVED"'} which seems correct to me. What value do you expect?

comment:3 by Nico Giefing, 4 years ago

Resolution: needsinfo
Status: closednew

As described in the RFC, in a cookie value a backslash "\" is not to be used.

The issue is that a " will be seen as the end of the cookie value on every device in the middle (WAF) and therefore not the full cookie value will get through to the webserver.

Please let me know if you need more guidance.

comment:4 by Mariusz Felisiak, 4 years ago

Resolution: wontfix
Status: newclosed

Django uses Python's http.cookies.SimpleCookie class, and as far as I'm aware we don't escape any characters in Django. It looks that it has already been reported in Python bug tracker, see https://bugs.python.org/issue19670.

I don't think that we should switch to BaseCookie. You can start a discussion on DevelopersMailingList if you don't agree.

comment:5 by Florian Apolloner, 4 years ago

Resolution: wontfix
Status: closednew

Given that you CC'ed me I am going to reopen this :) The spec is rather clear on what is allowed and we should try (when we generate cookies) to comply. This should at least apply to cookies Django core/contrib apps (d.c.messages) generates. I am not sure if we want to put encoding handling into our general cookie handling; I'd probably leave that to the enduser if they want to store weird data there :D

comment:6 by Florian Apolloner, 4 years ago

An initial approach can look like this:

diff --git a/django/contrib/messages/storage/cookie.py b/django/contrib/messages/storage/cookie.py
index b51e292aa0..c75ed75940 100644
--- a/django/contrib/messages/storage/cookie.py
+++ b/django/contrib/messages/storage/cookie.py
@@ -1,5 +1,7 @@
 import json
 
+from urllib.parse import quote, unquote
+
 from django.conf import settings
 from django.contrib.messages.storage.base import BaseStorage, Message
 from django.core import signing
@@ -8,6 +10,9 @@ from django.utils.crypto import constant_time_compare, salted_hmac
 from django.utils.safestring import SafeData, mark_safe
 
 
+SAFE_COOKIE_CHARS = "!#$%&'()*+/:<=>?@[]^`{|}"
+
+
 class MessageEncoder(json.JSONEncoder):
     """
     Compactly serialize instances of the ``Message`` class as JSON.
@@ -24,6 +29,9 @@ class MessageEncoder(json.JSONEncoder):
             return message
         return super().default(obj)
 
+    def encode(self, obj):
+        return quote(super().encode(obj), safe=SAFE_COOKIE_CHARS)
+
 
 class MessageDecoder(json.JSONDecoder):
     """
@@ -43,7 +51,7 @@ class MessageDecoder(json.JSONDecoder):
         return obj
 
     def decode(self, s, **kwargs):
-        decoded = super().decode(s, **kwargs)
+        decoded = super().decode(unquote(s), **kwargs)
         return self.process_messages(decoded)
 
 
diff --git a/tests/messages_tests/test_cookie.py b/tests/messages_tests/test_cookie.py
index 5d5fb42d67..af7f36f8d0 100644
--- a/tests/messages_tests/test_cookie.py
+++ b/tests/messages_tests/test_cookie.py
@@ -182,3 +182,12 @@ class CookieTests(BaseTests, SimpleTestCase):
             self.assertEqual(decoded, messages)
         storage_default = self.get_storage()
         self.assertNotEqual(encoded, storage_default._encode(messages))
+
+    def test_rfc_6265_encoding(self):
+        storage = self.storage_class(self.get_request())
+        msg_1 = Message(constants.INFO, 'Test me')
+        msg_2 = Message(constants.INFO, 'Test me \\again')
+        example_messages = [msg_1, msg_2]
+        encoded = storage._encode(example_messages)
+        expected = '[[%22__json_message%22%2C0%2C20%2C%22Test%20me%22]%2C[%22__json_message%22%2C0%2C20%2C%22Test%20me%20%5C%5Cagain%22]]'
+        self.assertEqual(encoded.split(':')[0], expected)

That said, this makes messages quite a bit longer :/

comment:7 by Mariusz Felisiak, 4 years ago

Component: HTTP handlingcontrib.messages
Summary: Not RFC Compliant Cookie handlingNot RFC 6265 compliant cookies in contrib.messages.
Triage Stage: UnreviewedAccepted

Agreed, we should fix cookies values from contrib.messages. I misunderstood the ticket description. Thanks Florian.

comment:8 by Florian Apolloner, 4 years ago

Well technically it applies to the whole request.cookies interface; but I don't see any feasible fix there. It would a) be highly backwards incompatible and b) cookies might require coordination with the frontend JS -- so it is better if the user can decide what to encode and how. I mean in the best case one doesn't store much in cookies either way, they do not perform well with concurrent requests for instance…

comment:9 by Craig Smith, 4 years ago

Hi, I would like to work on this ticket. I'm happy to get started on it this week, and would appreciate guidance on what tests we can do for backwards compatibility and the increase in message size (especially given the 4KB size limit for header fields set by Django).

comment:10 by Florian Apolloner, 4 years ago

Hi Craig, as for the message size I think it would be great if we could do something similar to signing.dumps (ie run zlib over the data). This will probably require us to add support for that in the Signer class in the first place; but it might be a nice addition.

comment:11 by Craig Smith, 4 years ago

Hi Florian, that's a great idea. Doing so also makes sure that the characters used in the cookie are acceptable for RFC 6265. I'll start work on it today.

comment:12 by Craig Smith, 4 years ago

I have implemented the initial approach according to the comment above. One thing to note is that when the cookie is accessed through the value attribute, it will need an unquote, for cases like req.cookies['messages'].value.

This could be attempted with a property to manage access to the value attribute to apply the unquote in there, so that any client code that uses that method of access can remain unchanged.

The compression has proved to be a bit tricky, due to changing between bytes and strings, but the approach that I am taking is to factor out the compression code from signing.dumps and loads into signing.compress and decompress.

It was encouraging to see swapping the quote unquote for singing.dumps signing.loads works out of the box, if we want all our message cookies timestamped, which I assume that we don't. But anyway, that works. I'll be back on the case later this week. Thanks for your tips, PR forthcoming.

comment:13 by Craig Smith, 4 years ago

Owner: changed from nobody to Craig Smith
Status: newassigned

comment:14 by Craig Smith, 4 years ago

PR - it's a work in progress

comment:15 by Collin Anderson, 4 years ago

Hi All,

I figure I should probably give some historical context about cookie quoting for the record. It mostly matches the analysis so far on the ticket:

When I refactored request.COOKIE (parse_cookie()) back in 2016, I was surprised that the cookie RFCs don't accurately describe how browsers work. I had to test browers myself to figure stuff out. The RFCs gave no guidance as to how to encode cookies, so there's really no official standard for how to encode/decode, and in that respect leaving it up to the application itself to do the encoding could make sense from a Django perspective. I did also notice that using % url quoting was what a lot of frameworks were using for encoding, so that is becoming more and more of an ad-hoc standard, despite not being in the RFC. (Percent encoding makes it really easy to encode/decode in Javascript.).

In 2016 I decided not to touch this encoding issue because I wanted to minimize backwards-compatible issues, so I kept using Python's http.cookies._unquote() (slash encoding) for backwards compatibility. (Again, browsers don't automatically unquote anything in cookies, so it's not a browser-compatibility thing)

I think long-term it would be great to stop encoding cookies using http.cookies. _quote(), as I think it's really just a python-specific way of quoting cookies, and as this issue states, it ends up creating invalid cookies. (invalid according the RFC at least. Browsers don't care and they work fine with many of what the RFC considers invalid characters.)

Part of me thinks it would be nice if Django continued to handle cookie encoding automatically and switch to % encoding, but I think that's going to lead to double encoding/decoding in some cases with the current api. Maybe we could have a response.set_cookie('name', 'value', encode=True/False) to control whether Django applies % encoding or not? We'd probably also need to have request.cookies.get('name', decode=True/False).

Anyway, I think the safest and most minimal thing to do is to modify the messages framework to % url quote cookies as PR 13690 suggests. I don't have a clear picture for a more general framework-wide fix, but I at least wanted to write my story down somewhere in case it's helpful.

comment:16 by Craig Smith, 4 years ago

Hi Colin, thanks for your story. It's good to know more of the background. Adding a boolean flag is sensible even in the Messages module, since if we change the encoding it would interfere with any existing client code that uses the stored message value. I guess the default argument should be encode=False.

comment:17 by Florian Apolloner, 4 years ago

Anyway, I think the safest and most minimal thing to do is to modify the messages framework to % url quote cookies as PR 13690 suggests. I don't have a clear picture for a more general framework-wide fix, but I at least wanted to write my story down somewhere in case it's helpful.

I kinda agree, but encoding like I suggested introduces quite an overhead. In that sense I think it would make sense to (if we are changing it anyways) ensure that we can generate even shorter messages. This is the reason why I suggested to move compress into the signer base classes. This way the messaging framework could benefit from shorter messages while automatically ensuring compatibility with the RFC (the created base64 only contains valid chars).

comment:18 by Craig Smith, 4 years ago

Hi All, I've updated the PR with the compress_b64encode and b64decode_decompress methods extracted into the django.core.signing module. I could use some help with the names, should they be compress and decompress or mention the b64 encoding as in the current PR?

Also, the encoding from string to bytestring and back again needs some thinking. If the compress/decompress methods are going to be used by client code (as indeed they might be to extract raw messages from req.cookies['messages'].value) then getting decompress to return a string instead of a bytestring would be better. And compress should be modified to accept a string. That will probably be my next step when I get back to this during the week.

in reply to:  18 comment:19 by Florian Apolloner, 4 years ago

Replying to Craig Smith:

Hi All, I've updated the PR with the compress_b64encode and b64decode_decompress methods extracted into the django.core.signing module. I could use some help with the names, should they be compress and decompress or mention the b64 encoding as in the current PR?

Personally I would have made those a feature of the signer class itself. This way we do not have to replicate the code everywhere.

Also, the encoding from string to bytestring and back again needs some thinking. If the compress/decompress methods are going to be used by client code (as indeed they might be to extract raw messages from req.cookies['messages'].value) then getting decompress to return a string instead of a bytestring would be better. And compress should be modified to accept a string. That will probably be my next step when I get back to this during the week.

As a general rule we should use bytes only on interfaces where it is needed and stay with strings as long as possible I think.

comment:20 by Craig Smith, 4 years ago

Thanks for clarifying those points Florian, I will make some adjustments.

comment:21 by Craig Smith, 4 years ago

Hi All, I'm having a terrible time of it due to encoding back and forth, and the fact that different encodings are used from place to place - utf-8, latin-1 and ascii (in session file storage). It feels like if I take a break from it and start over next week I could get somewhere - in the meantime, I am reading up on Unicode... If anyone else would like to jump in please do, if not, I'll get back into it next week.

The original suggestion to use quote unquote works and can be found in this commit. I might submit a separate PR just for that one, in case we can go with that.

comment:22 by Craig Smith, 4 years ago

Hi All,
I have started over. This the new PR.

Still some more work to do on it though.

I have opted to use latin-1 to encode internal to the new functions as we use latin-1 elsewhere. I was surprised when using utf-8 that a character was unrecognised when calling decompress_b64(request.cookies['messages'].value) in the messages_tests.test_mixins.test_set_messages_success test. But using latin-1 works. UTF-8 would be better, IMHO because we could store messages in many more languages, but those changes would need to be a bit more widespread and possibly have a bigger impact. But as the latin-1 character set is contained in utf-8, all we'd need to do is change our latin-1 encodings to utf-8 - is that something we should do here, or maybe bring it up on the mailing list?

Going forward, I will add tests, in particular to confirm RFC6265 compliant message cookies, and I will attempt integrating the new functions as methods of the signer base class.

Another question: signing.dumps currently takes a compress=False keyword arg, should this be passed through to the sign method? Or should the sign method compress and base64 encode by default?

Thanks for reading.

comment:23 by Florian Apolloner, 4 years ago

Hi Craig, I've looked through it and thought a bit about it:

I have opted to use latin-1 to encode internal to the new functions as we use latin-1 elsewhere. I was surprised when using utf-8 that a character was unrecognised

That sounds like a bug to fix. latin-1 encoding will simply not work in the general case (Try calling compress_b64 with u'€'). We run messages through MessageEncoder which will result in a JSON string which is always encodable to utf-8 (especially since ensure_ascii is true).

Going forward, I will add tests, in particular to confirm RFC6265 compliant message cookies

We also will need tests and a backwards compatibility for old existing messages.

and I will attempt integrating the new functions as methods of the signer base class.

I'd hold off on that now till we figured out the str/bytes/encoding issues -- we might have fond another hornet nest here :)

comment:24 by Craig Smith, 4 years ago

Hi Florian, thanks for your feedback. I see what you mean about the Euro symbol and unicode characters in general. To get around this I added a charset parameter to the compress/decompress methods. So we can use utf-8 by default, but latin-1 where required.

We need to use latin-1 in the signing.dumps/loads functions, since the signing.JSONSerializer uses it. Swapping it out leads us into that hornet's nest.

The two tests that require latin-1 are messages_tests.test_mixins.test_set_messages_success and messages_tests.test_cookie.test_cookie_settings. This happens because of one or two calls to storage.add and storage.update . As an experiment I replaced all occurrences of latin-1 and iso-8859-1 with utf-8 to see if those tests would pass - they wouldn't. So it's from a dependency outside of django, most likely to do with WSGI, which uses latin-1 somehow. The bytes in particular are 0x91 and 0x92, which are left and right single quote characters in latin-1. Maybe we should open another ticket about removing the iso-8859-1 charset. Or at least pushing it back further toward the dependency boundary.

I will get back onto this in a few days and come up with those extra tests. And then probably need to add documentation about the new methods.

comment:25 by Craig Smith, 4 years ago

Has patch: set
Needs documentation: set

I have added a few tests that cover the new functions. If someone can review it and confirm that no further changes are needed, after that I will add some documentation of the new functions in the signing module. For reference it is this PR

comment:26 by Florian Apolloner, 4 years ago

Hi Craig,

I've looked through your pull request and I finally understand why you needed the bunch of encode/decode stuff everywhere. Personally as a result of that I think the code is now rather ugly (certainly not your fault, but rather ours since we have a mix of bytes and strings everywhere here). I have an idea that I want to try out to directly use signing.loads/dumps in the message cookie backend; which might make this all (a bit?) easier.

I'll try to put something up as a draft PR and maybe we can find a nice way out of this mess :)

comment:27 by Florian Apolloner, 4 years ago

I started a discussion on the dev mailing list on how to get signing usage under control: https://groups.google.com/g/django-developers/c/X4N_71xeATM -- I think it would be good if we find some consensus there and then build upon that.

comment:28 by Craig Smith, 4 years ago

Needs documentation: unset

comment:29 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 102d92fc:

Refs #32191 -- Added Signer.sign_object()/unsign_object().

Co-authored-by: Craig Smith <hello@…>

comment:30 by Mariusz Felisiak, 4 years ago

Triage Stage: AcceptedReady for checkin

comment:31 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 2d6179c:

Fixed #32191 -- Made CookieStorage use RFC 6265 compliant format.

Co-authored-by: Craig Smith <hello@…>

comment:32 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

In 737fa72:

Refs #32191 -- Removed for the pre-Django 3.2 format of messages in CookieStorage.

Per deprecation timeline.

Note: See TracTickets for help on using tickets.
Back to Top