Code

Opened 11 months ago

Last modified 9 months ago

#20869 assigned New feature

Prevent repetitive output to counter BREACH-type attacks

Reported by: patrys Owned by: adambrenecki
Component: contrib.csrf Version: 1.5
Severity: Normal Keywords:
Cc: shai@…, adambrenecki Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently the CSRF middleware sets the cookie value to the plain value of the token. This makes it possible to try to guess the token by trying longer and longer prefix matches. An effective countermeasure would be to prevent repetitive output from appearing in case a guess matches the CSRF prefix.

As it's a prefix-based attack that relies mostly on the server's response, one way to counter it would be to salt the token with a random prefix and when validating discard all but the last n characters of both the cookie and the hidden field value. This would also keep the currently suggested AJAX solution working (JS would just submit the salted value along with the randomized payload).

Attachments (0)

Change History (27)

comment:1 Changed 11 months ago by tagrain@…

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

"Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token."

https://github.com/rails/rails/pull/11729 via Ars Technica

comment:2 Changed 11 months ago by patrys

What is the exact purpose of the XOR? If you can predict the pad then you already have enough information to BREACH the rest of the string. If the pad is truly random then there is no benefit from XOR. Random prefix could also be easily applied to all the HTTP-only cookies (like session ID).

comment:3 Changed 11 months ago by tagrain@…

adding a random prefix would just slow the attack down. The xor is required to make the attack possible

comment:4 Changed 11 months ago by tagrain@…

impossible*

comment:5 follow-up: Changed 11 months ago by patrys

What do you mean by slow? To use compression as an approach vector you need to know the prefix of the string you're targetting. If it's truly random then there is no way to predict a prefix long enough to make the attack feasible.

Let's assume that P is prefix and S is salt. The proposed solution suggests:

P₁ + P₂ + … + (S₁ XOR P₁) + (S₂ XOR P₂) + …

If I can predict P then I have already defeated the countermeasure and can continue the BREACH. Assuming a guess of G I can use:

P₁ + P₂ + … + (G₁ XOR P₁) + (G₂ XOR P₂) + …

The XOR part does not make the attack any more complicated.

comment:6 Changed 11 months ago by M

I note that Django's CSRF token output uses single quotes - value='....' - whilst form outputs by default use double quotes - value="...".
This means it seems impossible by default to cause a request to reflect something that would match the same prefix. You would have to guess the first three characters of the token cold (a large number of requests) in order to get started.

comment:7 Changed 11 months ago by patrys

You'd rather target the Set-Cookie headers. You want the target string to be as close to the beginning of the stream as possible to successfully poison the compression dict.

comment:8 Changed 11 months ago by M

Set-Cookie headers aren't gzip compressed, so this attack does not apply.

comment:9 Changed 11 months ago by patrys

I believe they are compressed if you use TLS-level compression.

comment:10 Changed 11 months ago by M

That is a separate issue, and TLS compression should anyway have been disabled since CRIME (and most/all browsers have done so). The issue with BREACH still requires a three-character prefix, and my posit here, which I'd be happy for someone to refute, is that Django, by default, militates this by its different quoting of the CSRF parameter.

comment:11 Changed 11 months ago by patrys

I don't think the quoting was meant to serve as a security feature. Depending on such implementation details will eventully lead to someone “fixing it” for consistency.

comment:12 Changed 11 months ago by M

Of course; I just wanted it documented that I believe it does currently serve as a security feature, and so any steps to address this can be discussed properly and do not have to be rushed.

comment:13 in reply to: ↑ 5 Changed 11 months ago by shai

Replying to patrys:

What do you mean by slow? To use compression as an approach vector you need to know the prefix of the string you're targetting. If it's truly random then there is no way to predict a prefix long enough to make the attack feasible.

You don't need a prefix of the pad; only the secret. And by most reports, 3 characters is enough and feasible.

Let's assume that P is prefix and S is salt.

P is "one-time-pad" and S is secret.

The proposed solution suggests:

P₁ + P₂ + … + (S₁ XOR P₁) + (S₂ XOR P₂) + …

If I can predict P then I have already defeated the countermeasure and can continue the BREACH.

That's a big if. As I noted above, if S is output as is, with no XOR or other combination with P, then you can simply ignore P. The XOR prevents that.

comment:14 Changed 11 months ago by patrys

The problem is that with no known prefix (say name="csrf_token" value=") you have no idea what part of the document you matched even if you try all valid three-character combinations and conclude that some matches happened.

comment:15 Changed 11 months ago by shai

  • Cc shai@… added

True, but -- as far as I've seen reported, I am not a security expert -- all that means is that you have to work a little harder and make more requests. Hence "slow down". XORing a one-time-pad onto the secret means that the secret is never repeated between responses, rendering the attack impossible.

comment:16 Changed 11 months ago by adambrenecki

  • Owner changed from nobody to adambrenecki
  • Status changed from new to assigned

I've tried to write a patch to fix this using the same method as is discussed above and in the Rails patch - by replacing all occurences of csrf_token in the body with p + xor(csrf_token, p), where p is randomly generated anew each request.

I've assumed all occurrences of csrf_token in the body come from django.middleware.csrf.get_token(). I've also assumed that accepting a non-XORed csrf_token can't hurt us (eg if someone loads a page before the server is upgraded and submits after), so long as we don't produce one.

I've verified that I haven't broken CSRF by logging in to the admin of a new Django project; however I'm still in the process of running the test suite, and I'm yet to write new tests.

(Also, this is my first Django patch, so I apologise if I've done something wrong :S)

https://github.com/adambrenecki/django/tree/ticket_20869

comment:17 follow-up: Changed 11 months ago by chris@…

This approach seems similar to that implemented in https://github.com/csghormley/django-debreach. In django-debreach, the token is AES-encrypted with a random key provided alongside it in the response body. Note the version available through pip did not work for me - see my fork until the issue gets resolved.

comment:18 follow-up: Changed 11 months ago by deprince@…

What if instead of modifying the csrf middleware to produce tokens that are resistant to discovery by Breach, we instead modified the GzipMiddleware to not compress deterministically? breach_buster provides an alternatibve GZipMiddleware implementation. It calls flush a few times at random points in the beginning of the file. This adds noise to the size of the output file size, making BREACH impossible to implement.

comment:19 in reply to: ↑ 18 Changed 11 months ago by carljm

Replying to deprince@…:

What if instead of modifying the csrf middleware to produce tokens that are resistant to discovery by Breach, we instead modified the GzipMiddleware to not compress deterministically? breach_buster provides an alternatibve GZipMiddleware implementation. It calls flush a few times at random points in the beginning of the file. This adds noise to the size of the output file size, making BREACH impossible to implement.

a) Modifying GZipMiddleware isn't comprehensive mitigation, many people don't use the middleware and let the front-end server handle compression instead (it usually performs better).

b) AFAIU adding random noise to the file length does not really protect against BREACH, it just means the attacker needs to make more requests to average out the noise and find the still-present signal.

comment:20 Changed 11 months ago by deprince@…

B is a good point, but easy to protect against. The PRNG can be keyed from the content's hash. This will make the compression deterministic on a per string basis, so averaging thousands of calls for a specific parameter won't betray the actual size it would have encoded to without a random sprinkling of flush calls. I'm modifying breach buster to consider this right now.

As for A, I guess if you want to use another project's content encoder you're on your own. Yes, some people use the broken GzipEncoder in nginx :-) Lets solve Django's problem and tell people "use our gzip knucklehead!"

comment:21 follow-up: Changed 11 months ago by deprince@…

In all seriousness I think we should both provide a BREACH resistant gzip implementation and provide a breach resistant CSRF token generator. Some people don't use our gzip, and some people ... don't use our CSRF token generator.

comment:22 in reply to: ↑ 21 Changed 11 months ago by adambrenecki

Replying to chris@…:

In django-debreach, the token is AES-encrypted with a random key provided alongside it in the response body.

Does using AES actually add anything over the XOR-with-random-value solution used in the Rails patch and mine? It seems like it wouldn't, unless I'm misunderstanding something (which is quite possible!).

Replying to deprince@…:

In all seriousness I think we should both provide a BREACH resistant gzip implementation and provide a breach resistant CSRF token generator. Some people don't use our gzip, and some people ... don't use our CSRF token generator.

Agreed, especially since GZipMiddleware is not in Django's default MIDDLEWARE_CLASSES, but Nginx defaults to gzip on;. Being insecure by default and making users a) know about and b) change a whole bunch of settings to avoid security holes is something best left to... certain other web platforms ;)

comment:23 Changed 11 months ago by adambrenecki

  • Has patch set

So, my patch now works on Python 3, and the tests now pass.

edit: Now squashed and as a GitHub PR: https://github.com/django/django/pull/1477

Should I write any new tests? Should I also make _check_token_present in the tests assert that the CSRF token in the body != the plaintext token (to make sure that this vulnerability isn't reintroduced somehow by accident)?

Last edited 11 months ago by adambrenecki (previous) (diff)

comment:24 Changed 11 months ago by deprince@…

Here is a pull request for a BREACH resistant version of GzipMiddlware.

https://github.com/django/django/pull/1473

comment:25 Changed 10 months ago by adambrenecki

  • Cc adambrenecki added

I've updated my CSRF one-time-pad pull request (https://github.com/django/django/pull/1477), masking the token with a Vigenère cipher instead of XOR, to ensure the output is printable without using base64.

comment:26 in reply to: ↑ 17 Changed 10 months ago by lpomfrey@…

Replying to chris@…:

This approach seems similar to that implemented in https://github.com/csghormley/django-debreach. In django-debreach, the token is AES-encrypted with a random key provided alongside it in the response body. Note the version available through pip did not work for me - see my fork until the issue gets resolved.

The issues mentioned have since been resolved, as have several others not fixed in Chris' fork.

django-debreach also includes a middleware (and decorator in the latest master at https://github.com/csghormley/django-debreach) to append a random comment string to text/html responses, which should provide additional protection against breach style attacks.

The use of AES vs. XOR is largely stylistic, both would provide the same level of protection.

comment:27 Changed 9 months ago by timo

  • Triage Stage changed from Unreviewed to Accepted
  • Type changed from Uncategorized to New feature

I assume we're going to do something in Django with respect to BREACH -- accepting the ticket on that basis.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as assigned
The owner will be changed from adambrenecki to anonymous. Next status will be 'assigned'
The ticket will be disowned. Next status will be 'new'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.