Code

Opened 16 months ago

Closed 5 months ago

#19980 closed Bug (fixed)

Signer broken when SECRET_KEY contains non-ASCII bytes

Reported by: nvie Owned by: MattBlack
Component: Core (Other) Version: 1.5
Severity: Normal Keywords: signer
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This happens on Django 1.5 only.

When the SECRET_KEY setting contains non-ASCII bytes (which is very likely for randomly generated keys), the following code breaks

from django.conf import settings
from django.core.signing import Signer
 
settings.SECRET_KEY = b'\xe7'  # binary key with non-ASCII chars
 
s = Signer()
print(s.sign('foo'))

Relevant stack trace:

Traceback (most recent call last):
  File "minimal_example.py", line 7, in <module>
    s.sign('foo')
  File "/Users/nvie/.virtualenvs/foo/lib/python2.7/site-packages/django/core/signing.py", line 175, in sign
    return str('%s%s%s') % (value, self.sep, self.signature(value))
  File "/Users/nvie/.virtualenvs/foo/lib/python2.7/site-packages/django/core/signing.py", line 169, in signature
    signature = base64_hmac(self.salt + 'signer', value, self.key)
  File "/Users/nvie/.virtualenvs/foo/lib/python2.7/site-packages/django/core/signing.py", line 75, in base64_hmac
    return b64_encode(salted_hmac(salt, value, key).digest())
  File "/Users/nvie/.virtualenvs/foo/lib/python2.7/site-packages/django/utils/crypto.py", line 48, in salted_hmac
    key = hashlib.sha1((key_salt + secret).encode('utf-8')).digest()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 0: ordinal not in range(128)

This is due to the fact that unicode and bytes are concatenated and, as a result, the bytes are implicitly decoded, which is an invalid operation, hence the UnicodeDecodeError. The error, thus, is perfectly valid.

I think the source of the bug is that, internally, the Signer works with text-based keys, salts and values, where it should work with byte streams instead.

Attachments (0)

Change History (9)

comment:1 Changed 16 months ago by nvie

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I took a stab at a possible solution, which works for me. Can others step in to see if this would break other cases? (The test suite ran fine, though.)

https://github.com/nvie/django/compare/django:1.5...nvie:bugfix/signer-with-binary-key

comment:2 Changed 16 months ago by claudep

  • Triage Stage changed from Unreviewed to Design decision needed

The random string that Django produces in startproject never contains non-ascii characters. Now of course anyone is free to set it to any random content. However, do we gain anything in allowing SECRET_KEY to be a bytestring? In the spirit of Python 3, I would be more in favour of documenting that SECRET_KEY is a string (not a bytestring), and if you include in it any non-ASCII chars on Python 2, you should prefix it by u''.

comment:3 Changed 16 months ago by anonymous

I would argue that, given that this is used in cryptographic contexts, the key should be bytes, not a string. At the lowest level, all crypto happens on bytes anyway, and with strings you always rely on an encoding. Even though we can assume utf-8 fairly reasonably, this still is an assumption, and basically only used to convert the unicode string to a byte string. Why then, let us not pass in the bytes explicitly, independent of an encoding?

Also, this patch would make it compatible with the Django<1.5 behaviour.

Even more so, this would make it similar to other web frameworks. For example, in Flask, the SECRET_KEY value is recommended to be generated completely random, using os.urandom(). Thus, a byte string.

If consensus is around using a unicode string nonetheless, this should be explicitly documented at the very least, and maybe even type-checked.

comment:4 Changed 16 months ago by lukeplant

I agree it makes more sense for SECRET_KEY to be a bytestring, especially with the argument about os.random but we also need compatibility with unicode, especially as we cross the Python 2/3 barrier and as projects migrate.

So I think we should document that it can be either a string or bytestring, and it will be converted using UTF8 if it is a string.

In the patch, I would like to see a test that shows we are generating the same signature as before for unicode strings - this will require a hard coded signature generated with the previous signing function.

comment:5 Changed 16 months ago by nvie

In Django < 1.5

In Django < 1.5, unicode strings aren't expected as keys, nor as signature input. Keys or inputs must be byte strings, or happen to contain only ascii chars, in which case the implicit str(key) works accidentally. Since the default django-admin.py startproject command will never contain any non-ascii chars, most people won't even notice when transitioning to Django 1.5.

Proof:

>>> import django
>>> from django.core.signing import Signer
>>> django.VERSION
(1, 4, 3, 'final', 0)
>>> s = Signer(u'H\xebll\xf8, w\xf3rld!')
>>> s.signature('foo')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position 33: ordinal not in range(128)

An ascii-chars-only unicode key works, though:

>>> s1 = Signer(u'hello world')  # unicode key
>>> s1.signature('foo')
'DvqCK1YYys43rI7wbM-yOakczOc'
>>> s2 = Signer(b'hello world')  # bytestring key
>>> s2.signature('foo')
'DvqCK1YYys43rI7wbM-yOakczOc'
>>> s1.signature('foo') == s2.signature('foo')
True

Signing unicode strings doesn't work, though:

>>> s1.signature(u'foo')   # ascii-only unicode strings work
'DvqCK1YYys43rI7wbM-yOakczOc'
>>> s1.signature(u'H\xebll\xf8, w\xf3rld!')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position 1: ordinal not in range(128)

To Django 1.5.1

Actually, this "bug" plays in our advantage now. Since we want the behaviour to change such that Signer can work with both unicode _and_ bytestrings, and we want to just interpret any unicode key or input as its utf8-encoded bytestring counterpart, we aren't changing the historical behaviour. For all ascii-only unicode strings u, the following holds: u.encode('ascii') == u.encode('utf8'). Since those were the only valid inputs prior to Django 1.5, nothing changes for existing users. The only thing we're adding is that non-ascii unicode strings will work additionally, for people that prefer working with bytestrings as secret keys.

The only remaining issue remains the following: when using python3, the output of the .sign() method should be in native string format, i.e. bytes in python2 and unicode in python3. The output of that function is a string consisting of two parts: (1) the original input and (2) the signature, separated by a colon. Since the original input to .sign() can be a bytestring, this might pose a problem in python3 when .sign() tries to return a unicode string anyway.

Example showcasing the trouble:

>>> import sys
>>> sys.version
>>> tuple(sys.version_info)
(3, 3, 0, 'final', 0)
>>> import django
>>> django.VERSION
(1, 5, 0, 'final', 0)
>>> from django.core.signing import Signer
>>> s = Signer('foo')
>>> s.sign(b'\xe7')
...
django.utils.encoding.DjangoUnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 0: unexpected end of data. You passed in b'\xe7' (<class 'bytes'>)

It might be a theoretical case, maybe, but I think it's important that all bytestrings and unicode strings can be validly fed to .sign()—no exception. Maybe it's an idea to add a new method .sign_bytes(), which always returns the strict bytes signature, and have .sign() use that and decode its value back to unicode in the case of python3. We can document that .sign() will always return a unicode string in python3, and thus may fail for some inputs. If people want to be able to sign _any_ input, they should use .sign_bytes(), which works for all inputs, but will return bytes.

Another solution would be to have .sign() return whatever type is fed to it as input:

>>> type(s.sign(b'foo')) == bytes
>>> type(s.sign(u'foo')) == str  # unicode

What do you think? I could take a stab at this if we agree on the potential solution.

comment:6 Changed 16 months ago by Alex

  • Triage Stage changed from Design decision needed to Accepted

Marking this as accepted, in principle both the secret key and the salt_key should be bytes, not strings, so there should be no need to encode them.

comment:7 Changed 14 months ago by timo

  • Has patch set

pull request from the original reporter:
https://github.com/django/django/pull/878

comment:8 Changed 5 months ago by MattBlack

  • Owner changed from nobody to MattBlack
  • Status changed from new to assigned

comment:9 Changed 5 months ago by Honza Král <honza.kral@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In a8ba76c2d3ad95595590af31c5cda123dc3868b7:

Fixed #19980: Signer broken for binary keys (with non-ASCII chars).

With this pull request, request #878 should considered closed.

Thanks to nvie for the patch.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.