I have a view that is 825157 bytes without gzipping, 35751 bytes gzipped as an HttpResponse, but 1010920 bytes gzipped as a StreamingHttpResponse. The output of the script given below with some noddy data is:
Normal string: 38890
compress_string: 18539
compress_sequence: 89567
compress_sequence, no flush: 18539
Noddy content perhaps, but in actual use I'm very much wanting to use StreamingHttpResponse on very large JSON responses (then it uses 200Mb memory with iterables throughout, as opposed to 2Gb with more standard code/HttpResponse), and the Python json package flushes after each key, value, and punctuation in-between. Having the gzip middleware flush similarly creates a much larger output than no gzipping, with the figures given at the top. It would seem that many uses of StreamingHttpResponse will similarly be flushing regularly at the content level. #7581 does mention "some penalty in compression performance" but producing a worse-than-none performance seems a bit much :)
Should compress_sequence bunch up flushes to provide at least some level of compression? Or if it's a StreamingHttpResponse, should it not bother gzipping?
from django.utils.text import *
from django.utils.six.moves import map
# Identical to django.utils.text.compress_sequence
# but with the flush line commented out
def compress_sequence_without_flush(sequence):
buf = StreamingBuffer()
zfile = GzipFile(mode='wb', compresslevel=6, fileobj=buf)
# Output headers...
yield buf.read()
for item in sequence:
zfile.write(item)
# zfile.flush()
yield buf.read()
zfile.close()
yield buf.read()
class Example(object):
def __iter__(self):
return map(str, xrange(10000))
e = Example()
print 'Normal string:', len(b''.join(e))
print 'compress_string:', len(compress_string(b''.join(e)))
print 'compress_sequence:', len(b''.join(compress_sequence(e)))
print 'compress_sequence, no flush:', len(b''.join(compress_sequence_without_flush(e)))
Removing the flush(), the output does appear to 'bunch' itself into groups of about 17k, with the output ending up the same size as if it had been gzipped as a string. I have made this change and added a test of some JSON output at https://github.com/django/django/pull/4010 , hope that's of interest.