#20869 closed New feature (fixed)
Prevent repetitive output to counter BREACH-type attacks
Reported by: | Patryk Zawadzki | Owned by: | Shai Berger |
---|---|---|---|
Component: | CSRF | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | shai@…, Adam Brenecki, emorley@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Currently the CSRF middleware sets the cookie value to the plain value of the token. This makes it possible to try to guess the token by trying longer and longer prefix matches. An effective countermeasure would be to prevent repetitive output from appearing in case a guess matches the CSRF prefix.
As it's a prefix-based attack that relies mostly on the server's response, one way to counter it would be to salt the token with a random prefix and when validating discard all but the last n characters of both the cookie and the hidden field value. This would also keep the currently suggested AJAX solution working (JS would just submit the salted value along with the randomized payload).
Change History (38)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
What is the exact purpose of the XOR? If you can predict the pad then you already have enough information to BREACH the rest of the string. If the pad is truly random then there is no benefit from XOR. Random prefix could also be easily applied to all the HTTP-only cookies (like session ID).
comment:3 by , 11 years ago
Adding a random prefix would just slow the attack down. The xor is required to make the attack impossible.
follow-up: 13 comment:5 by , 11 years ago
What do you mean by slow? To use compression as an approach vector you need to know the prefix of the string you're targetting. If it's truly random then there is no way to predict a prefix long enough to make the attack feasible.
Let's assume that P is prefix and S is salt. The proposed solution suggests:
P₁ + P₂ + … + (S₁ XOR P₁) + (S₂ XOR P₂) + …
If I can predict P then I have already defeated the countermeasure and can continue the BREACH. Assuming a guess of G I can use:
P₁ + P₂ + … + (G₁ XOR P₁) + (G₂ XOR P₂) + …
The XOR part does not make the attack any more complicated.
comment:6 by , 11 years ago
I note that Django's CSRF token output uses single quotes - value='....' - whilst form outputs by default use double quotes - value="...".
This means it seems impossible by default to cause a request to reflect something that would match the same prefix. You would have to guess the first three characters of the token cold (a large number of requests) in order to get started.
comment:7 by , 11 years ago
You'd rather target the Set-Cookie headers. You want the target string to be as close to the beginning of the stream as possible to successfully poison the compression dict.
comment:8 by , 11 years ago
Set-Cookie headers aren't gzip compressed, so this attack does not apply.
comment:10 by , 11 years ago
That is a separate issue, and TLS compression should anyway have been disabled since CRIME (and most/all browsers have done so). The issue with BREACH still requires a three-character prefix, and my posit here, which I'd be happy for someone to refute, is that Django, by default, militates this by its different quoting of the CSRF parameter.
comment:11 by , 11 years ago
I don't think the quoting was meant to serve as a security feature. Depending on such implementation details will eventully lead to someone “fixing it” for consistency.
comment:12 by , 11 years ago
Of course; I just wanted it documented that I believe it does currently serve as a security feature, and so any steps to address this can be discussed properly and do not have to be rushed.
comment:13 by , 11 years ago
Replying to patrys:
What do you mean by slow? To use compression as an approach vector you need to know the prefix of the string you're targetting. If it's truly random then there is no way to predict a prefix long enough to make the attack feasible.
You don't need a prefix of the pad; only the secret. And by most reports, 3 characters is enough and feasible.
Let's assume that P is prefix and S is salt.
P is "one-time-pad" and S is secret.
The proposed solution suggests:
P₁ + P₂ + … + (S₁ XOR P₁) + (S₂ XOR P₂) + …
If I can predict P then I have already defeated the countermeasure and can continue the BREACH.
That's a big if. As I noted above, if S is output as is, with no XOR or other combination with P, then you can simply ignore P. The XOR prevents that.
comment:14 by , 11 years ago
The problem is that with no known prefix (say name="csrf_token" value="
) you have no idea what part of the document you matched even if you try all valid three-character combinations and conclude that some matches happened.
comment:15 by , 11 years ago
Cc: | added |
---|
True, but -- as far as I've seen reported, I am not a security expert -- all that means is that you have to work a little harder and make more requests. Hence "slow down". XORing a one-time-pad onto the secret means that the secret is never repeated between responses, rendering the attack impossible.
comment:16 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I've tried to write a patch to fix this using the same method as is discussed above and in the Rails patch - by replacing all occurences of csrf_token
in the body with p + xor(csrf_token, p)
, where p
is randomly generated anew each request.
I've assumed all occurrences of csrf_token
in the body come from django.middleware.csrf.get_token()
. I've also assumed that accepting a non-XORed csrf_token
can't hurt us (eg if someone loads a page before the server is upgraded and submits after), so long as we don't produce one.
I've verified that I haven't broken CSRF by logging in to the admin of a new Django project; however I'm still in the process of running the test suite, and I'm yet to write new tests.
(Also, this is my first Django patch, so I apologise if I've done something wrong :S)
follow-up: 26 comment:17 by , 11 years ago
This approach seems similar to that implemented in https://github.com/csghormley/django-debreach. In django-debreach, the token is AES-encrypted with a random key provided alongside it in the response body. Note the version available through pip did not work for me - see my fork until the issue gets resolved.
follow-up: 19 comment:18 by , 11 years ago
What if instead of modifying the csrf middleware to produce tokens that are resistant to discovery by Breach, we instead modified the GzipMiddleware to not compress deterministically? breach_buster provides an alternatibve GZipMiddleware implementation. It calls flush a few times at random points in the beginning of the file. This adds noise to the size of the output file size, making BREACH impossible to implement.
comment:19 by , 11 years ago
Replying to deprince@…:
What if instead of modifying the csrf middleware to produce tokens that are resistant to discovery by Breach, we instead modified the GzipMiddleware to not compress deterministically? breach_buster provides an alternatibve GZipMiddleware implementation. It calls flush a few times at random points in the beginning of the file. This adds noise to the size of the output file size, making BREACH impossible to implement.
a) Modifying GZipMiddleware isn't comprehensive mitigation, many people don't use the middleware and let the front-end server handle compression instead (it usually performs better).
b) AFAIU adding random noise to the file length does not really protect against BREACH, it just means the attacker needs to make more requests to average out the noise and find the still-present signal.
comment:20 by , 11 years ago
B is a good point, but easy to protect against. The PRNG can be keyed from the content's hash. This will make the compression deterministic on a per string basis, so averaging thousands of calls for a specific parameter won't betray the actual size it would have encoded to without a random sprinkling of flush calls. I'm modifying breach buster to consider this right now.
As for A, I guess if you want to use another project's content encoder you're on your own. Yes, some people use the broken GzipEncoder in nginx :-) Lets solve Django's problem and tell people "use our gzip knucklehead!"
follow-up: 22 comment:21 by , 11 years ago
In all seriousness I think we should both provide a BREACH resistant gzip implementation and provide a breach resistant CSRF token generator. Some people don't use our gzip, and some people ... don't use our CSRF token generator.
comment:22 by , 11 years ago
Replying to chris@…:
In django-debreach, the token is AES-encrypted with a random key provided alongside it in the response body.
Does using AES actually add anything over the XOR-with-random-value solution used in the Rails patch and mine? It seems like it wouldn't, unless I'm misunderstanding something (which is quite possible!).
Replying to deprince@…:
In all seriousness I think we should both provide a BREACH resistant gzip implementation and provide a breach resistant CSRF token generator. Some people don't use our gzip, and some people ... don't use our CSRF token generator.
Agreed, especially since GZipMiddleware
is not in Django's default MIDDLEWARE_CLASSES
, but Nginx defaults to gzip on;
. Being insecure by default and making users a) know about and b) change a whole bunch of settings to avoid security holes is something best left to... certain other web platforms ;)
comment:23 by , 11 years ago
Has patch: | set |
---|
So, my patch now works on Python 3, and the tests now pass.
edit: Now squashed and as a GitHub PR: https://github.com/django/django/pull/1477
Should I write any new tests? Should I also make _check_token_present
in the tests assert that the CSRF token in the body !=
the plaintext token (to make sure that this vulnerability isn't reintroduced somehow by accident)?
comment:24 by , 11 years ago
Here is a pull request for a BREACH resistant version of GzipMiddlware.
comment:25 by , 11 years ago
Cc: | added |
---|
I've updated my CSRF one-time-pad pull request (https://github.com/django/django/pull/1477), masking the token with a Vigenère cipher instead of XOR, to ensure the output is printable without using base64.
comment:26 by , 11 years ago
Replying to chris@…:
This approach seems similar to that implemented in https://github.com/csghormley/django-debreach. In django-debreach, the token is AES-encrypted with a random key provided alongside it in the response body. Note the version available through pip did not work for me - see my fork until the issue gets resolved.
The issues mentioned have since been resolved, as have several others not fixed in Chris' fork.
django-debreach also includes a middleware (and decorator in the latest master at https://github.com/csghormley/django-debreach) to append a random comment string to text/html responses, which should provide additional protection against breach style attacks.
The use of AES vs. XOR is largely stylistic, both would provide the same level of protection.
comment:27 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → New feature |
I assume we're going to do something in Django with respect to BREACH -- accepting the ticket on that basis.
comment:28 by , 10 years ago
Patch needs improvement: | set |
---|---|
Version: | 1.5 → master |
comment:29 by , 10 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
De-assigning this from myself, as github.com/auvipy has expressed an interest in doing the remaining work necessary to get my PR merged.
I probably should have actually done this a long time ago, when it first became apparent I wouldn't have the time to follow it up after PyConAU sprints finished; sorry guys.
comment:34 by , 9 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
It's high time we did something about this.
comment:37 by , 9 years ago
Patch needs improvement: | set |
---|
Left comments for improvement and there's also a failing test.
comment:38 by , 9 years ago
Cc: | added |
---|
comment:39 by , 9 years ago
Patch needs improvement: | unset |
---|
comment:40 by , 9 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Looks good pending a final review from security experts.
comment:42 by , 8 years ago
Are there plans to port this to 1.8 LTS and 1.9 since they are both receiving support at this time?
comment:43 by , 8 years ago
So far, this has not been treated as a critical bug, although it certainly has security implications. There is also a minor backward-incompatibility involved (for users or developers who changed the value of the CSRF cookie). So, there are currently no plans to backport it. You can mitigate the problem when using these versions by either turning off compression or using 3rd-party solutions such as django-debreach.
"Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token."
https://github.com/rails/rails/pull/11729 via Ars Technica