Opened 7 years ago

Closed 3 years ago

#9370 closed Bug (fixed)

Non utf-8 DEFAULT_CHARSET causes UnicodeDecodeError when serving binary data through middleware

Reported by: kikko Owned by: kkubasik
Component: Core (Other) Version: 1.0
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When serving binary static files (for example images) using django.views.static.serve and using GZipMiddleware a traceback is returned instead of the image:

Traceback (most recent call last):

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/core/servers/basehttp.py", line 277, in run
    self.result = application(self.environ, self.start_response)

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/core/servers/basehttp.py", line 634, in __call__
    return self.application(environ, start_response)

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/core/handlers/wsgi.py", line 243, in __call__
    response = middleware_method(request, response)

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/middleware/gzip.py", line 16, in process_response
    if response.status_code != 200 or len(response.content) < 200:

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/http/__init__.py", line 359, in _get_content
    return smart_str(''.join(self._container), self._charset)

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/utils/encoding.py", line 97, in smart_str
    return s.decode('utf-8', errors).encode(encoding, errors)

  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte

Change History (13)

comment:1 Changed 7 years ago by jacob

  • milestone set to 1.1
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 6 years ago by kkubasik

  • Owner changed from nobody to kkubasik

I'm going to look into this.

comment:3 Changed 6 years ago by kkubasik

  • Resolution set to invalid
  • Status changed from new to closed

I cannot reproduce this against current trunk. Please reopen if you can create a test case which is reliably reproducible.

comment:4 Changed 6 years ago by niksite

  • Resolution invalid deleted
  • Status changed from closed to reopened

I have this reproduced with latest TRUNK:

[16:25 /home/niksite/webapps/django/datamining]$ ./manage.py runserver
Validating models...
0 errors found

Django version 1.1 beta 1 SVN-10982, using settings 'datamining.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Traceback (most recent call last):

File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run

self.result = application(self.environ, self.start_response)

File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in call

return self.application(environ, start_response)

File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in call

response = middleware_method(request, response)

File "/home/niksite/lib/site-python/django/middleware/gzip.py", line 16, in process_response

if response.status_code != 200 or len(response.content) < 200:

File "/home/niksite/lib/site-python/django/http/init.py", line 365, in _get_content

return smart_str(.join(self._container), self._charset)

File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str

return s.decode('utf-8', errors).encode(encoding, errors)

File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode

return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
[12/Jun/2009 16:26:27] "GET /media/images/white.png HTTP/1.0" 500 1162

By the way, the following error is produced if gzip-middleware is disabled:

Traceback (most recent call last):

File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run

self.result = application(self.environ, self.start_response)

File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in call

return self.application(environ, start_response)

File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in call

response = middleware_method(request, response)

File "/home/niksite/lib/site-python/django/middleware/cache.py", line 91, in process_response

patch_response_headers(response, timeout)

File "/home/niksite/lib/site-python/django/utils/cache.py", line 108, in patch_response_headers

responseETag? = '"%s"' % md5_constructor(response.content).hexdigest()

File "/home/niksite/lib/site-python/django/http/init.py", line 365, in _get_content

return smart_str(.join(self._container), self._charset)

File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str

return s.decode('utf-8', errors).encode(encoding, errors)

File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode

return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
[12/Jun/2009 16:28:37] "GET /media/images/white.png HTTP/1.0" 500 1319

With cache-middleware disabled I see no errors:
[12/Jun/2009 16:29:31] "GET /media/images/white.png HTTP/1.0" 200 4233

comment:5 Changed 6 years ago by niksite

Sorry, formatting has been lost in my last message. This is repost with some additions.

I have this reproduced with latest TRUNK:

[16:25 /home/niksite/webapps/django/datamining]$ ./manage.py runserver
Validating models...
0 errors found

Django version 1.1 beta 1 SVN-10982, using settings 'datamining.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Traceback (most recent call last):
  File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run
    self.result = application(self.environ, self.start_response)
  File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in __call__
    return self.application(environ, start_response)
  File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in __call__
    response = middleware_method(request, response)
  File "/home/niksite/lib/site-python/django/middleware/gzip.py", line 16, in process_response
    if response.status_code != 200 or len(response.content) < 200:
  File "/home/niksite/lib/site-python/django/http/__init__.py", line 365, in _get_content
    return smart_str(''.join(self._container), self._charset)
  File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str
    return s.decode('utf-8', errors).encode(encoding, errors)
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
[12/Jun/2009 16:26:27] "GET /media/images/white.png HTTP/1.0" 500 1162

By the way, the following error is produced if gzip-middleware is disabled:

Traceback (most recent call last):
  File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run
    self.result = application(self.environ, self.start_response)
  File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in __call__
    return self.application(environ, start_response)
  File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in __call__
    response = middleware_method(request, response)
  File "/home/niksite/lib/site-python/django/middleware/cache.py", line 91, in process_response
    patch_response_headers(response, timeout)
  File "/home/niksite/lib/site-python/django/utils/cache.py", line 108, in patch_response_headers
    response['ETag'] = '"%s"' % md5_constructor(response.content).hexdigest()
  File "/home/niksite/lib/site-python/django/http/__init__.py", line 365, in _get_content
    return smart_str(''.join(self._container), self._charset)
  File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str
    return s.decode('utf-8', errors).encode(encoding, errors)
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
[12/Jun/2009 16:28:37] "GET /media/images/white.png HTTP/1.0" 500 1319

With cache-middleware disabled I see no errors:

[12/Jun/2009 16:29:31] "GET /media/images/white.png HTTP/1.0" 200 4233

I have the following settings:

MIDDLEWARE_CLASSES = (
	'django.middleware.cache.UpdateCacheMiddleware',
	#'django.middleware.http.ConditionalGetMiddleware',
	'django.middleware.gzip.GZipMiddleware',
	# 'debug_toolbar.middleware.DebugToolbarMiddleware',
	'django.contrib.sessions.middleware.SessionMiddleware',
	'django.middleware.locale.LocaleMiddleware',
	'django.middleware.common.CommonMiddleware',
	'django.contrib.auth.middleware.AuthenticationMiddleware',
	'django.middleware.doc.XViewMiddleware',
	'django.contrib.flatpages.middleware.FlatpageFallbackMiddleware',
	'django.middleware.cache.FetchFromCacheMiddleware',
)
CACHE_MIDDLEWARE_SECONDS = 0
CACHE_BACKEND = "dummy://"
MEDIA_URL = "http://127.0.0.1:8000/media/"

And the following lines in urls.py:

	urlpatterns += patterns('',
							(r'^media/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}),

comment:6 Changed 6 years ago by russellm

  • Resolution set to worksforme
  • Status changed from reopened to closed

Can't reproduce with provided instructions.

comment:7 Changed 6 years ago by kmtracey

  • Resolution worksforme deleted
  • Status changed from closed to reopened
  • Summary changed from UnicodeDecodeError when serving binary static files through GZipMiddleWare to Non utf-8 DEFAULT_CHARSET causes UnicodeDecodeError when serving binary data through middleware

The bit of info missing from the descriptions that is key to hitting the problem is to have DEFAULT_CHARSET set to something other than utf-8 in settings.py. That's how you reach this line in the traceback:

File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str
    return s.decode('utf-8', errors).encode(encoding, errors)

I tried setting DEFAULT_CHARSET to 'latin1' (and adding the gzip middleware) for one of my projects that generates and serves .png files and sure enough starting hitting this traceback. It's not specific to django.views.static.serve, nor really the gzip middleware, it's anything that causes this routine:

    def _get_content(self):
        if self.has_header('Content-Encoding'):
            return ''.join(self._container)
        return smart_str(''.join(self._container), self._charset)

in django/http/__init__.py to be called and take the path of calling smart_str with binary (and thus non-likely-to-be-successfully-decoded using utf-8) data and self._charset set to something other than utf-8.

Given the constraints needed to recreate this, it may be appropriate to defer this past 1.1. But it's also pretty late here right now so perhaps there is some easy fix that escapes me at the moment, so I'll leave that decision to someone else.

comment:8 Changed 6 years ago by niksite

You are right. My settings.py have DEFAULT_CHARSET = 'UTF-8' record. Error disappears when I've changed it to DEFAULT_CHARSET = 'utf-8' . Thank you!

comment:9 Changed 6 years ago by russellm

  • milestone 1.1 deleted

I'm happy to push this to post v1.1. There's no data loss involved, and the reproduction condition is an edge case.

comment:10 Changed 4 years ago by lukeplant

  • Severity set to Normal
  • Type set to Bug

comment:11 Changed 4 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:12 Changed 4 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:13 Changed 3 years ago by aaugustin

  • Resolution set to fixed
  • Status changed from reopened to closed

This was fixed in da56e1bac6449daef9aeab8d076d2594d9fd5b44, where I took care *not* to call force_bytes on bytes objects, in order not to trigger re-encoding. See #18796.

Note: See TracTickets for help on using tickets.
Back to Top