Opened 11 years ago
Closed 11 years ago
#21179 closed Cleanup/optimization (fixed)
How-to output CSV from Django should suggest using `StreamingHttpResponse`
Reported by: | Simon Charette | Owned by: | Rigel Di Scala |
---|---|---|---|
Component: | Documentation | Version: | dev |
Severity: | Normal | Keywords: | afraid-to-commit |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
The Outputting CSV with Django how-to doesn't even mention StreamingHttpResponse
even if it’s useful for generating large CSV files.
I suggest we replace the example with something along the following:
import csv from StringIO import StringIO from django.http import StreamingHttpResponse def some_view(request): rows = ( ['First row', 'Foo', 'Bar', 'Baz'], ['Second row', 'A', 'B', 'C', '"Testing"', "Here's a quote"] ) # Define a generator to stream data directly to the client def stream(): buffer_ = StringIO() writer = csv.writer(buffer_) for row in rows: writer.writerow(row) buffer_.seek(0) data = buffer_.read() buffer_.seek(0) buffer_.truncate() yield data # Create the streaming response object with the appropriate CSV header. response = StreamingHttpResponse(stream(), content_type='text/csv') response['Content-Disposition'] = 'attachment; filename="somefilename.csv"' return response
Change History (17)
comment:1 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 11 years ago
Keywords: | afraid-to-commit added |
---|
comment:3 by , 11 years ago
I'm not convinced. I've output many a CSV file and never needed the streaming response to get performance. Whilst this is a useful addition to mention at this point in the docs, I don't think we should be recommending the more complex option.
comment:4 by , 11 years ago
The code example looks like C, not like Python... I don't want to see buffer_.seek(0)
in our docs.
Streaming responses don't change much when you pull all the data in RAM, and if the data comes from a queryset, Django currently does that even if you use .iterator()
. It seems much more interesting to me to optimize the database side than the HTTP response side.
comment:5 by , 11 years ago
Thinking about it I must agree that without server-side cursor support (#16614) the tradeoff is not worth turning the simple example into a overly complex one.
I just thought it was odd that StreamingHttpResponse
's documentation mentions that it’s useful for generating large CSV files but our provided tutorial doesn't even mention it.
What do you guys think of adding an admonition with no specific example to the how-to explaining StreamingHttpResponse
might be useful in this case?
comment:6 by , 11 years ago
StreamingHttpResponse
could still do with some example code in the docs, even if it doesn't replace the existing example.
comment:7 by , 11 years ago
Any ideas regarding what type of example should be given in the docs for StreamingHttpResponse?
comment:8 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:9 by , 11 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
Hello, I would like to work on this ticket.
I think that some information on how to test a view that returns a StreamingHttpResponse() would be useful. The Django test Client actually returns an iterable response, and the .streaming_content
property is an instance of <itertools.imap>. You would then need to concatenate it into a string in order to test it, as you would do with the standard HttpResponse.
comment:10 by , 11 years ago
I was thinking of something along these lines:
import csv from django.http import StreamingHttpResponse class Echo(object): def write(self, value): return value def some_streaming_view(request): rows = (["Row {0}".format(idx), str(idx)] for idx in xrange(100)) buffer_ = Echo() writer = csv.writer(buffer_) response = StreamingHttpResponse((writer.writerow(row) for row in rows), content_type="text/csv") response['Content-Disposition'] = 'attachment; filename="somefilename.csv"' return response
I have tested it with curl, a simple test case with the Django test client, and a regular browser.
follow-up: 13 comment:11 by , 11 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:12 by , 11 years ago
You can also test this with an infinite series, such as the classic Fibonacci function, if you replace the range generator with something like:
def fib(): a, b = 0, 1 while 1: yield a a, b = b, a + b
I tested this and the memory use did not increase significantly even after streaming over a gigabyte of data for a single request.
comment:13 by , 11 years ago
The example above looks good to me. Please do submit a pull request - thanks.
comment:14 by , 11 years ago
Has patch: | set |
---|
I have opened a pull request here:
https://github.com/django/django/pull/2358
I am using a slight variation of the above example, using Python 3 friendly code and some additional comments, as suggested by bmispelon
.
comment:15 by , 11 years ago
Resubmitted a new pull request: https://github.com/django/django/pull/2397
comment:16 by , 11 years ago
Needs documentation: | unset |
---|
comment:17 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Yes, and also https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.StreamingHttpResponse should link to this, in the text "For instance, it’s useful for generating large CSV files"