Opened 3 years ago

Closed 5 months ago

Last modified 3 months ago

#20869 closed New feature (fixed)

Prevent repetitive output to counter BREACH-type attacks

Reported by: Patryk Zawadzki Owned by: Shai Berger
Component: CSRF Version: master
Severity: Normal Keywords:
Cc: shai@…, Adam Brenecki, emorley@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently the CSRF middleware sets the cookie value to the plain value of the token. This makes it possible to try to guess the token by trying longer and longer prefix matches. An effective countermeasure would be to prevent repetitive output from appearing in case a guess matches the CSRF prefix.

As it's a prefix-based attack that relies mostly on the server's response, one way to counter it would be to salt the token with a random prefix and when validating discard all but the last n characters of both the cookie and the hidden field value. This would also keep the currently suggested AJAX solution working (JS would just submit the salted value along with the randomized payload).

Change History (38)

comment:1 Changed 3 years ago by tagrain@…

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

"Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token."

https://github.com/rails/rails/pull/11729 via Ars Technica

comment:2 Changed 3 years ago by Patryk Zawadzki

What is the exact purpose of the XOR? If you can predict the pad then you already have enough information to BREACH the rest of the string. If the pad is truly random then there is no benefit from XOR. Random prefix could also be easily applied to all the HTTP-only cookies (like session ID).

comment:3 Changed 3 years ago by tagrain@…

Adding a random prefix would just slow the attack down. The xor is required to make the attack impossible.

Last edited 3 months ago by Tim Graham (previous) (diff)

comment:5 Changed 3 years ago by Patryk Zawadzki

What do you mean by slow? To use compression as an approach vector you need to know the prefix of the string you're targetting. If it's truly random then there is no way to predict a prefix long enough to make the attack feasible.

Let's assume that P is prefix and S is salt. The proposed solution suggests:

P₁ + P₂ + … + (S₁ XOR P₁) + (S₂ XOR P₂) + …

If I can predict P then I have already defeated the countermeasure and can continue the BREACH. Assuming a guess of G I can use:

P₁ + P₂ + … + (G₁ XOR P₁) + (G₂ XOR P₂) + …

The XOR part does not make the attack any more complicated.

comment:6 Changed 3 years ago by M

I note that Django's CSRF token output uses single quotes - value='....' - whilst form outputs by default use double quotes - value="...".
This means it seems impossible by default to cause a request to reflect something that would match the same prefix. You would have to guess the first three characters of the token cold (a large number of requests) in order to get started.

comment:7 Changed 3 years ago by Patryk Zawadzki

You'd rather target the Set-Cookie headers. You want the target string to be as close to the beginning of the stream as possible to successfully poison the compression dict.

comment:8 Changed 3 years ago by M

Set-Cookie headers aren't gzip compressed, so this attack does not apply.

comment:9 Changed 3 years ago by Patryk Zawadzki

I believe they are compressed if you use TLS-level compression.

comment:10 Changed 3 years ago by M

That is a separate issue, and TLS compression should anyway have been disabled since CRIME (and most/all browsers have done so). The issue with BREACH still requires a three-character prefix, and my posit here, which I'd be happy for someone to refute, is that Django, by default, militates this by its different quoting of the CSRF parameter.

comment:11 Changed 3 years ago by Patryk Zawadzki

I don't think the quoting was meant to serve as a security feature. Depending on such implementation details will eventully lead to someone “fixing it” for consistency.

comment:12 Changed 3 years ago by M

Of course; I just wanted it documented that I believe it does currently serve as a security feature, and so any steps to address this can be discussed properly and do not have to be rushed.

comment:13 in reply to:  5 Changed 3 years ago by Shai Berger

Replying to patrys:

What do you mean by slow? To use compression as an approach vector you need to know the prefix of the string you're targetting. If it's truly random then there is no way to predict a prefix long enough to make the attack feasible.

You don't need a prefix of the pad; only the secret. And by most reports, 3 characters is enough and feasible.

Let's assume that P is prefix and S is salt.

P is "one-time-pad" and S is secret.

The proposed solution suggests:

P₁ + P₂ + … + (S₁ XOR P₁) + (S₂ XOR P₂) + …

If I can predict P then I have already defeated the countermeasure and can continue the BREACH.

That's a big if. As I noted above, if S is output as is, with no XOR or other combination with P, then you can simply ignore P. The XOR prevents that.

Last edited 6 months ago by Tim Graham (previous) (diff)

comment:14 Changed 3 years ago by Patryk Zawadzki

The problem is that with no known prefix (say name="csrf_token" value=") you have no idea what part of the document you matched even if you try all valid three-character combinations and conclude that some matches happened.

comment:15 Changed 3 years ago by Shai Berger

Cc: shai@… added

True, but -- as far as I've seen reported, I am not a security expert -- all that means is that you have to work a little harder and make more requests. Hence "slow down". XORing a one-time-pad onto the secret means that the secret is never repeated between responses, rendering the attack impossible.

comment:16 Changed 3 years ago by Adam Brenecki

Owner: changed from nobody to Adam Brenecki
Status: newassigned

I've tried to write a patch to fix this using the same method as is discussed above and in the Rails patch - by replacing all occurences of csrf_token in the body with p + xor(csrf_token, p), where p is randomly generated anew each request.

I've assumed all occurrences of csrf_token in the body come from django.middleware.csrf.get_token(). I've also assumed that accepting a non-XORed csrf_token can't hurt us (eg if someone loads a page before the server is upgraded and submits after), so long as we don't produce one.

I've verified that I haven't broken CSRF by logging in to the admin of a new Django project; however I'm still in the process of running the test suite, and I'm yet to write new tests.

(Also, this is my first Django patch, so I apologise if I've done something wrong :S)

https://github.com/adambrenecki/django/tree/ticket_20869

comment:17 Changed 3 years ago by chris@…

This approach seems similar to that implemented in https://github.com/csghormley/django-debreach. In django-debreach, the token is AES-encrypted with a random key provided alongside it in the response body. Note the version available through pip did not work for me - see my fork until the issue gets resolved.

comment:18 Changed 3 years ago by deprince@…

What if instead of modifying the csrf middleware to produce tokens that are resistant to discovery by Breach, we instead modified the GzipMiddleware to not compress deterministically? breach_buster provides an alternatibve GZipMiddleware implementation. It calls flush a few times at random points in the beginning of the file. This adds noise to the size of the output file size, making BREACH impossible to implement.

comment:19 in reply to:  18 Changed 3 years ago by Carl Meyer

Replying to deprince@…:

What if instead of modifying the csrf middleware to produce tokens that are resistant to discovery by Breach, we instead modified the GzipMiddleware to not compress deterministically? breach_buster provides an alternatibve GZipMiddleware implementation. It calls flush a few times at random points in the beginning of the file. This adds noise to the size of the output file size, making BREACH impossible to implement.

a) Modifying GZipMiddleware isn't comprehensive mitigation, many people don't use the middleware and let the front-end server handle compression instead (it usually performs better).

b) AFAIU adding random noise to the file length does not really protect against BREACH, it just means the attacker needs to make more requests to average out the noise and find the still-present signal.

comment:20 Changed 3 years ago by deprince@…

B is a good point, but easy to protect against. The PRNG can be keyed from the content's hash. This will make the compression deterministic on a per string basis, so averaging thousands of calls for a specific parameter won't betray the actual size it would have encoded to without a random sprinkling of flush calls. I'm modifying breach buster to consider this right now.

As for A, I guess if you want to use another project's content encoder you're on your own. Yes, some people use the broken GzipEncoder in nginx :-) Lets solve Django's problem and tell people "use our gzip knucklehead!"

comment:21 Changed 3 years ago by deprince@…

In all seriousness I think we should both provide a BREACH resistant gzip implementation and provide a breach resistant CSRF token generator. Some people don't use our gzip, and some people ... don't use our CSRF token generator.

comment:22 in reply to:  21 Changed 3 years ago by Adam Brenecki

Replying to chris@…:

In django-debreach, the token is AES-encrypted with a random key provided alongside it in the response body.

Does using AES actually add anything over the XOR-with-random-value solution used in the Rails patch and mine? It seems like it wouldn't, unless I'm misunderstanding something (which is quite possible!).

Replying to deprince@…:

In all seriousness I think we should both provide a BREACH resistant gzip implementation and provide a breach resistant CSRF token generator. Some people don't use our gzip, and some people ... don't use our CSRF token generator.

Agreed, especially since GZipMiddleware is not in Django's default MIDDLEWARE_CLASSES, but Nginx defaults to gzip on;. Being insecure by default and making users a) know about and b) change a whole bunch of settings to avoid security holes is something best left to... certain other web platforms ;)

comment:23 Changed 3 years ago by Adam Brenecki

Has patch: set

So, my patch now works on Python 3, and the tests now pass.

edit: Now squashed and as a GitHub PR: https://github.com/django/django/pull/1477

Should I write any new tests? Should I also make _check_token_present in the tests assert that the CSRF token in the body != the plaintext token (to make sure that this vulnerability isn't reintroduced somehow by accident)?

Last edited 3 years ago by Adam Brenecki (previous) (diff)

comment:24 Changed 3 years ago by deprince@…

Here is a pull request for a BREACH resistant version of GzipMiddlware.

https://github.com/django/django/pull/1473

comment:25 Changed 3 years ago by Adam Brenecki

Cc: Adam Brenecki added

I've updated my CSRF one-time-pad pull request (https://github.com/django/django/pull/1477), masking the token with a Vigenère cipher instead of XOR, to ensure the output is printable without using base64.

comment:26 in reply to:  17 Changed 3 years ago by lpomfrey@…

Replying to chris@…:

This approach seems similar to that implemented in https://github.com/csghormley/django-debreach. In django-debreach, the token is AES-encrypted with a random key provided alongside it in the response body. Note the version available through pip did not work for me - see my fork until the issue gets resolved.

The issues mentioned have since been resolved, as have several others not fixed in Chris' fork.

django-debreach also includes a middleware (and decorator in the latest master at https://github.com/csghormley/django-debreach) to append a random comment string to text/html responses, which should provide additional protection against breach style attacks.

The use of AES vs. XOR is largely stylistic, both would provide the same level of protection.

comment:27 Changed 3 years ago by Tim Graham

Triage Stage: UnreviewedAccepted
Type: UncategorizedNew feature

I assume we're going to do something in Django with respect to BREACH -- accepting the ticket on that basis.

comment:28 Changed 2 years ago by Tim Graham

Patch needs improvement: set
Version: 1.5master

comment:29 Changed 19 months ago by Adam Brenecki

Owner: Adam Brenecki deleted
Status: assignednew

De-assigning this from myself, as github.com/auvipy has expressed an interest in doing the remaining work necessary to get my PR merged.

I probably should have actually done this a long time ago, when it first became apparent I wouldn't have the time to follow it up after PyConAU sprints finished; sorry about that.

Last edited 19 months ago by Adam Brenecki (previous) (diff)

comment:34 Changed 11 months ago by Shai Berger

Owner: set to Shai Berger
Status: newassigned

It's high time we did something about this.

comment:36 Changed 11 months ago by Shai Berger

Patch needs improvement: unset

Patch updated, now ready for review.

comment:37 Changed 9 months ago by Tim Graham

Patch needs improvement: set

Left comments for improvement and there's also a failing test.

comment:38 Changed 9 months ago by Ed Morley

Cc: emorley@… added

comment:39 Changed 6 months ago by Tim Graham

Patch needs improvement: unset

comment:40 Changed 6 months ago by Tim Graham

Triage Stage: AcceptedReady for checkin

Looks good pending a final review from security experts.

comment:41 Changed 5 months ago by Shai Berger <shai@…>

Resolution: fixed
Status: assignedclosed

In 5112e65e:

Fixed #20869 -- made CSRF tokens change every request by salt-encrypting them

Note that the cookie is not changed every request, just the token retrieved
by the get_token() method (used also by the {% csrf_token %} tag).

While at it, made token validation strict: Where, before, any length was
accepted and non-ASCII chars were ignored, we now treat anything other than
[A-Za-z0-9]{64} as invalid (except for 32-char tokens, which, for
backwards-compatibility, are accepted and replaced by 64-char ones).

Thanks Trac user patrys for reporting, github user adambrenecki
for initial patch, Tim Graham for help, and Curtis Maloney,
Collin Anderson, Florian Apolloner, Markus Holtermann & Jon Dufresne
for reviews.

comment:42 Changed 3 months ago by clayk

Are there plans to port this to 1.8 LTS and 1.9 since they are both receiving support at this time?

comment:43 Changed 3 months ago by Shai Berger

So far, this has not been treated as a critical bug, although it certainly has security implications. There is also a minor backward-incompatibility involved (for users or developers who changed the value of the CSRF cookie). So, there are currently no plans to backport it. You can mitigate the problem when using these versions by either turning off compression or using 3rd-party solutions such as django-debreach.

Note: See TracTickets for help on using tickets.
Back to Top