Opened 11 years ago

Closed 11 years ago

#21179 closed Cleanup/optimization (fixed)

How-to output CSV from Django should suggest using `StreamingHttpResponse`

Reported by: Simon Charette Owned by: Rigel Di Scala
Component: Documentation Version: dev
Severity: Normal Keywords: afraid-to-commit
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

The Outputting CSV with Django how-to doesn't even mention StreamingHttpResponse even if it’s useful for generating large CSV files.

I suggest we replace the example with something along the following:

import csv
from StringIO import StringIO

from django.http import StreamingHttpResponse


def some_view(request):
    rows = (
        ['First row', 'Foo', 'Bar', 'Baz'],
        ['Second row', 'A', 'B', 'C', '"Testing"', "Here's a quote"]
    )

    # Define a generator to stream data directly to the client
    def stream():
        buffer_ = StringIO()
        writer = csv.writer(buffer_)
        for row in rows:
            writer.writerow(row)
            buffer_.seek(0)
            data = buffer_.read()
            buffer_.seek(0)
            buffer_.truncate()
            yield data

    # Create the streaming response  object with the appropriate CSV header.
    response = StreamingHttpResponse(stream(), content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'

    return response

Change History (17)

comment:1 by Daniele Procida, 11 years ago

Triage Stage: UnreviewedAccepted

Yes, and also https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.StreamingHttpResponse should link to this, in the text "For instance, it’s useful for generating large CSV files"

comment:2 by Daniele Procida, 11 years ago

Keywords: afraid-to-commit added

comment:3 by Marc Tamlyn, 11 years ago

I'm not convinced. I've output many a CSV file and never needed the streaming response to get performance. Whilst this is a useful addition to mention at this point in the docs, I don't think we should be recommending the more complex option.

comment:4 by Aymeric Augustin, 11 years ago

The code example looks like C, not like Python... I don't want to see buffer_.seek(0) in our docs.

Streaming responses don't change much when you pull all the data in RAM, and if the data comes from a queryset, Django currently does that even if you use .iterator(). It seems much more interesting to me to optimize the database side than the HTTP response side.

comment:5 by Simon Charette, 11 years ago

Thinking about it I must agree that without server-side cursor support (#16614) the tradeoff is not worth turning the simple example into a overly complex one.

I just thought it was odd that StreamingHttpResponse's documentation mentions that it’s useful for generating large CSV files but our provided tutorial doesn't even mention it.

What do you guys think of adding an admonition with no specific example to the how-to explaining StreamingHttpResponse might be useful in this case?

comment:6 by Daniele Procida, 11 years ago

StreamingHttpResponse could still do with some example code in the docs, even if it doesn't replace the existing example.

comment:7 by ANUBHAV JOSHI, 11 years ago

Any ideas regarding what type of example should be given in the docs for StreamingHttpResponse?

comment:8 by Rigel Di Scala, 11 years ago

Owner: changed from nobody to Rigel Di Scala
Status: newassigned

comment:9 by Rigel Di Scala, 11 years ago

Owner: Rigel Di Scala removed
Status: assignednew

Hello, I would like to work on this ticket.

I think that some information on how to test a view that returns a StreamingHttpResponse() would be useful. The Django test Client actually returns an iterable response, and the .streaming_content property is an instance of <itertools.imap>. You would then need to concatenate it into a string in order to test it, as you would do with the standard HttpResponse.

comment:10 by Rigel Di Scala, 11 years ago

I was thinking of something along these lines:

import csv

from django.http import StreamingHttpResponse


class Echo(object):
    def write(self, value):
        return value


def some_streaming_view(request):
    rows = (["Row {0}".format(idx), str(idx)] for idx in xrange(100))
    buffer_ = Echo()
    writer = csv.writer(buffer_)
    response = StreamingHttpResponse((writer.writerow(row) for row in rows),
                                     content_type="text/csv")
    response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
    return response

I have tested it with curl, a simple test case with the Django test client, and a regular browser.

comment:11 by Rigel Di Scala, 11 years ago

Owner: set to Rigel Di Scala
Status: newassigned

comment:12 by Rigel Di Scala, 11 years ago

You can also test this with an infinite series, such as the classic Fibonacci function, if you replace the range generator with something like:

I tested this and the memory use did not increase significantly even after streaming over a gigabyte of data for a single request.

Version 0, edited 11 years ago by Rigel Di Scala (next)

in reply to:  11 comment:13 by Daniele Procida, 11 years ago

The example above looks good to me. Please do submit a pull request - thanks.

comment:14 by Rigel Di Scala, 11 years ago

Has patch: set

I have opened a pull request here:

https://github.com/django/django/pull/2358

I am using a slight variation of the above example, using Python 3 friendly code and some additional comments, as suggested by bmispelon.

comment:15 by Rigel Di Scala, 11 years ago

Resubmitted a new pull request: https://github.com/django/django/pull/2397

comment:16 by Tim Graham, 11 years ago

Needs documentation: unset

comment:17 by Tim Graham <timograham@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In fad47367bf622635b4cf931db72310cce41cebb4:

Fixed #21179 -- Added a StreamingHttpResponse example for CSV files.

Thanks charettes for the suggestion.

Note: See TracTickets for help on using tickets.
Back to Top