Opened 8 years ago

Closed 8 years ago

#4969 closed (fixed)

GZip middleware fails due to UnicodeDecodeError

Reported by: Johann Queuniet <johann.queuniet@…> Owned by: nobody
Component: HTTP handling Version: master
Severity: Keywords: unicode gzip middleware
Cc: Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

The GZip middleware crash after the compression, when it tries to fill the Content-Length header and calls len(str(request.content)). HttpResponse calls smart_str(), which fails since it expects UTF-8 content.

2007-07-25 18:32:48: (mod_fastcgi.c.2550) FastCGI-stderr: Traceback (most recent call last):
  File "/usr/lib64/python2.5/site-packages/flup/server/fcgi_base.py", line 558, in run
    protocolStatus, appStatus = self.server.handler(self)
  File "/usr/lib64/python2.5/site-packages/flup/server/fcgi_base.py", line 1112, in handler
    result = self.application(environ, start_response)
  File "/usr/lib/python2.5/django/core/handlers/wsgi.py", line 194, in __call__
  File "/usr/lib64/python2.5/site-packages/django/middleware/gzip.py", line 28, in process_response
    response['Content-Length'] = str(len(response.content))
  File "/usr/lib64/python2.5/site-packages/django/http/__init__.py", line 275, in _get_content
    content = smart_str(''.join(self._container), self._charset)
  File "/usr/lib/python2.5/django/utils/encoding.py", line 63, in smart_str
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: unexpected code byte

I don't really know where the fault lies here. Is it smart_str's if it chocks on gzip-encoded content ? Or should the HttpResponse object do something else when the Content-Encoding header is set ?

The patch included assumes the latter.

Attachments (1)

gzip-http.diff (469 bytes) - added by Johann Queuniet <johann.queuniet@…> 8 years ago.
patch for HttpResponse

Download all attachments as: .zip

Change History (5)

Changed 8 years ago by Johann Queuniet <johann.queuniet@…>

patch for HttpResponse

comment:1 Changed 8 years ago by Simon G. <dev@…>

  • Keywords unicode added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement set
  • Summary changed from GZip middleware fails to GZip middleware fails due to UnicodeDecodeError
  • Triage Stage changed from Unreviewed to Ready for checkin

comment:2 Changed 8 years ago by Simon G. <dev@…>

  • Triage Stage changed from Ready for checkin to Design decision needed

My promotion here was a bit hasty, I'll move this back to DDN to get some comments on the best way to do this.

comment:3 Changed 8 years ago by mtredinnick

I think it's pretty close as it is, Simon. It's on my list to look at shortly, anyway. Leave it in its current state, but if anybody else is looking at this, my gut feeling is the approach in the patch is along the right lines and we just have to make sure the edge-cases are handled correctly.

comment:4 Changed 8 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [6548]) Fixed #4969 -- Changed content retrieval in HttpResponse to be more robust in
the presence of an existing content encoding. Fixes some sporadic failures with
the GzipMiddleware, for example. Thanks, Johann Queuniet.

Note: See TracTickets for help on using tickets.
Back to Top