Opened 3 years ago

Closed 3 years ago

#33212 closed Bug (wontfix)

Incorrect cookie parsing by django.http.cookie.parse_cookie

Reported by: Christos Georgiou Owned by: nobody
Component: Core (Other) Version: 3.2
Severity: Normal Keywords: cookies
Cc: Collin Anderson Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I understand that the Python http.cookie has issues with invalid cookies that are anyway used in the wild. An example of such a cookie is (you'll probably see a cookie with raw JSON as value if you happen to use HotJar):

valid_cookie=12; invalid_cookie={"k1": "v1", "k2": "v2"}; valid_cookie2="other value"

Python's parsing will only parse valid_cookie, while Django's django.http.cookie.parse_cookie will parse all of them.

However, this imaginary cookie is incorrectly parsed:

django_cookie=good_value; third_party="some_cookie=some_value; django_cookie=bad_value"
>>> from django.http.cookie import parse_cookie
>>> parse_cookie('''django_cookie=good_value; third_party="some_cookie=some_value; django_cookie=bad_value"''')
{'django_cookie': 'bad_value"', 'third_party': '"some_cookie=some_value'}

One would expect django_cookie to have good_value.

If you consider this as grave enough, I can supply a patch.

Change History (12)

comment:1 by Christos Georgiou, 3 years ago

There is a related discussion at https://code.djangoproject.com/ticket/26158 where it is claimed that splitting on ';' is safe (because a raw ';' is invalid in a cookie value). I don't think it is, unless we assume that all HTTP requests come from well-known browsers that are trustworthy to follow standards.

Last edited 3 years ago by Christos Georgiou (previous) (diff)

comment:2 by Mariusz Felisiak, 3 years ago

Cc: Collin Anderson added

Thanks for the report, IMO, this is only a hypothetical issue. Cookie values cannot contain semicolons according to RFC 6265. Browsers allow for creating cookie with invalid values via document.cookie, however semicolons are always forbidden.

Version 0, edited 3 years ago by Mariusz Felisiak (next)

in reply to:  1 ; comment:3 by Mr. Glass, 3 years ago

Replying to Christos Georgiou:

There is a related discussion at https://code.djangoproject.com/ticket/26158 where it is claimed that splitting on ';' is safe (because a raw ';' is invalid in a cookie value). I don't think it is, unless we assume that all HTTP requests come from well-known browsers that are trustworthy to follow standards.

Parsing errors like this often lead to security vulnerabilities, and an attacker will NOT use a client that conforms to standards

in reply to:  3 ; comment:4 by Mariusz Felisiak, 3 years ago

Replying to Mr. Glass:

Parsing errors like this often lead to security vulnerabilities, and an attacker will NOT use a client that conforms to standards

As far as I'm aware you will first need to convince users to use such a browser.

in reply to:  4 comment:5 by Mr. Glass, 3 years ago

Replying to Mariusz Felisiak:

Replying to Mr. Glass:

Parsing errors like this often lead to security vulnerabilities, and an attacker will NOT use a client that conforms to standards

As far as I'm aware you will first need to convince users to use such a browser.

One common vulnerability from parsing mismatch is a WAF bypass. If the WAF & application have a parsing mismatch, you can trick the WAF into allowing requests through. See the 'Messing around with a WAF parser' section of this article for an example https://blog.isec.pl/waf-evasion-techniques/

Typically this is because WAFs have bad parsers, but the vulnerability works either way. This is often used for sql injection and would not require any user interaction.

Take this theoretical cookie:

user_id="12345"; invalid_cookie={"k1": "v1", "k2": "v2"}; user_id=";' DROP TABLE important_data"

A python based WAF would see the user_id as a non malicious integer, while Django would have the value ";' DROP TABLE important_data" ready to be injected.

No interaction from the user, just a carefully crafted request from an attacker.

Thankfully in Django, this attack would require custom authentication that was vulnerable to sql injection, since the ORM protects us.

Last edited 3 years ago by Mr. Glass (previous) (diff)

comment:6 by Collin Anderson, 3 years ago

Hi All,

Author of Django's current parse_cookie() code here. I re-wrote parse_cookie() because python's cookie parsing was not matching browsers, and as you say, parsing mismatches can lead to security issues. Here's how you can create this non-RFC6265 compliant cookie string using javascript. You need to be on '/test/' or some url that's not the root path (/), and have no other cookies set:

document.cookie = 'django_cookie=good_value'
document.cookie = 'third_party="some_cookie=some_value'
document.cookie = 'django_cookie=bad_value"; path=/'
var expected = 'django_cookie=good_value; third_party="some_cookie=some_value; django_cookie=bad_value"'
if(document.cookie === expected) { alert('matches')}

That creates 3 cookies as far as the browser is concerned, and if you look in the inspector, you can see the 3 cookies listed. This works in Chrome, Firefox, Safari, Internet Explorer, etc. If you reload the page, the browser will send the exact Cookie header mentioned above to the server.

Now, two of the 3 cookies have the same django_cookie name, and Django has historically always used the _last_ cookie if there are multiple, but I could see changing that to use the first one instead, as most other server software seem to look at the _first_ cookie when there are multiple with the same name.

Here's javascript code to create a case where the "bad" cookie comes before the "good" cookie:

document.cookie = 'django_cookie=good_value; path=/'
document.cookie = 'third_party="some_cookie=some_value'
document.cookie = 'django_cookie=bad_value"'
var expected = 'third_party="some_cookie=some_value; django_cookie=bad_value"; django_cookie=good_value'
if(document.cookie === expected) { alert('matches')}

Here's how the two cases are parsed with different server software:
django_cookie=good_value; third_party="some_cookie=some_value; django_cookie=bad_value"
PHP: <?php print_r($_COOKIE); ?> says Array ( [django_cookie] => good_value [third_party] => "some_cookie=some_value)
Rails: debug(cookies) says {"django_cookie"=>"good_value", "third_party"=>"\"some_cookie=some_value"}
Node require('cookie').parse() says { django_cookie: 'good_value', third_party: 'some_cookie=some_valu' }
Node require('cookie-parse').parse() says { django_cookie: 'good_value', third_party: 'some_cookie=some_valu' }
Django currently says {'django_cookie': 'bad_value"', 'third_party': '"some_cookie=some_value'}

third_party="some_cookie=some_value; django_cookie=bad_value"; django_cookie=good_value
PHP: <?php print_r($_COOKIE); ?> says Array ( [third_party] => "some_cookie=some_value [django_cookie] => bad_value" )
Rails debug(cookies) says {"third_party"=>"\"some_cookie=some_value", "django_cookie"=>"bad_value\""}
Node require('cookie').parse() says { third_party: 'some_cookie=some_valu', django_cookie: 'bad_value"' }
Node require('cookie-parse').parse() says { third_party: 'some_cookie=some_valu', django_cookie: 'bad_value"' }
Django currently says {'third_party': '"some_cookie=some_value', 'django_cookie': 'good_value'}

So it might make sense to switch Django to using the first value, rather than the last value, so it matches what other frameworks are doing, as it currently is the opposite of other frameworks.

Here's how to make user_id="12345"; invalid_cookie={"k1": "v1", "k2": "v2"}; user_id="; ' DROP TABLE important_data" using javascript (browsers always add a space after semicolon, so I added a space in this case). Doesn't work in all browsers, but it works in Chrome at least. (Again, doesn't work on / path, but any other subpath works, and you need to have no other cookies set)

document.cookie = 'user_id="12345"'
document.cookie = 'invalid_cookie={"k1": "v1", "k2": "v2"}'
document.cookie = 'user_id="; path=/'
document.cookie = '\' DROP TABLE important_data"; path=/'
var expected = 'user_id="12345"; invalid_cookie={"k1": "v1", "k2": "v2"}; user_id="; \' DROP TABLE important_data"'
if(document.cookie === expected) { alert('matches')} else { alert('no match: ' + document.cookie) }

Again, you can use the browser's inspector to see the 4 different cookies. Note that there's a cookie with a value but no name. When refreshing the page, the cookie header sent to the server matches document.cookie. Here's how it gets parsed by different frameworks:

PHP Array ( [user_id] => "12345" [invalid_cookie] => {"k1": "v1", "k2": "v2"} ['_DROP_TABLE_important_data"] => )
Rails {"user_id"=>"\"12345\"", "invalid_cookie"=>"{\"k1\": \"v1\", \"k2\": \"v2\"}", "' DROP TABLE important_data\""=>nil}
Node cookie { user_id: '12345', invalid_cookie: '{"k1": "v1", "k2": "v2"}' }
Node cookie-parse { user_id: '12345', invalid_cookie: '{"k1": "v1", "k2": "v2"}', '\' DROP TABLE important_data"': 'true' }
Django currently says: {'user_id': '"', 'invalid_cookie': '{"k1": "v1", "k2": "v2"}', '': '\' DROP TABLE important_data"'}

So, again, maybe it makes sense for Django to switch to using the _first_ value for user_id, instead of the 2nd. I also tried to make Django handle the "cookie with a value but no name" case as close as possible to how browsers handle it, despite other servers not handing that case.

Would using the first value rather than the last value fix this issue for you? It seems to me it would solve the specific issue in each of your examples, as long as the "bad" cookies come after the "good" cookies.

Thanks,
Collin

comment:7 by Mr. Glass, 3 years ago

I don't think there will be a perfect answer to this until there is a new RFC that applications actually respect, and what we have now may be the best option.

Is it a bad idea to refuse the invalid cookies outright?

P.S. Thanks for that great write-up Colin, I'm learning new things today :)

comment:8 by Florian Apolloner, 3 years ago

I have been thinking about this and I am a bit torn.

First and foremost I do not agree with the OP when they say:

django_cookie=good_value; third_party="some_cookie=some_value; django_cookie=bad_value"
...
One would expect django_cookie to have good_value.

Why would one expect django_cookie to have good_value? That example Cookie header is invalid per the specification. As such it should never have existed and the "proper" way would be to simply ignore the header completely. Now the question is how much we'd break with that :D I am a bit afraid to do so because in practice django_cookie=" a b c" will probably work well enough through most servers even though it is also invalid.

I agree with Collin that changing the order would be a good start to bring us in line with what the rest of the frameworks does -- simply to ensure that a proxy that operates on cookies does not choose the other (majority) variant and leaves us vulnerable. That said, that proxy/waf should get fixed as well.

It is true that this does have security impacts, but an attacker would have to be able to control the cookie values and in most cases the cookie values don't contain enough interesting material. Even sending two session ids will only cause problems if the proxy before also performs ACL-checks against that session id. But since we never know what this proxy does, we cannot fix it, because the fix might result in a vulnerability in the first place.

Given that most websites seem to be fine with the current behavior I'd be okay to add a strict cookie parsing (and also for set-cookie) behind a setting (as much as I hate settings I expect a non-minimal amount of websites to have invalid cookies). Similar discussion happened on https://code.djangoproject.com/ticket/32191 where Django itself created invalid cookies.

Last edited 3 years ago by Florian Apolloner (previous) (diff)

comment:9 by Collin Anderson, 3 years ago

As far as optionally rejecting non-rfc cookies goes, I think the way to do it would be to first split the entire header on semicolon, and then accept or ignore individual name-value-pairs. Basically, treat each thing between semicolons like the first part of a Set-Cookie header, and then parse according to "Set-Cookie header" rules https://datatracker.ietf.org/doc/html/rfc6265#section-5.2 (which says to first parse up to semicolon to get name-value-pair).

But even then, like Florian said, we don't know whether a WAF is going to accept or ignore a cookie, so it could actually make the situation worse if Django has stricter parsing than the WAF. Again, most other Cookie header parsing code seems to allow non-rfc characters.

I suppose another thing that may or may not help: if there were a cookie api for getting a lower-level list of tuples instead of dict, then people could ignore individual key-value pairs if they wanted to, and there's no information loss in the case of multiple cookies with the same name. The RFC pretty much explicitly allows for multiple cookies with the same name, so that's not going to go away any time soon.

Anyway, I created a little PR for using first cookie value rather than last value if we want to do that. There's some backward-compatibilty concerns, but it's probably for the best long-term to try to match other parsers as far as which cookie value to use:
https://github.com/django/django/pull/15015

comment:10 by Collin Anderson, 3 years ago

Actually, if we split on semicolon, and then follow Set-Cookie parsing algorithm from https://datatracker.ietf.org/doc/html/rfc6265#section-5.2, we're actually already following the spec really well as it is.

Here's a PR to make it match the Set-Cookie parsing spec slightly better (and use first value rather than last value in the dictionary).

https://github.com/django/django/pull/15019

comment:11 by Mariusz Felisiak, 3 years ago

Thanks for the discussion!

As far as I'm aware both propositions (PR15015 and PR15019) are backward incompatible, so they are against out stability policy, unless it's not a security issue (which is not the case, IMO). We can make a breaking changes but we need a strong reason to do that. I'm concerned that changing the current behavior may actually make things worse in some cases. It seems to me that we need a broader discussion on this.

comment:12 by Mariusz Felisiak, 3 years ago

Resolution: wontfix
Status: newclosed

Closing with the request to start a discussion.

Note: See TracTickets for help on using tickets.
Back to Top