Opened 8 years ago

Closed 5 years ago

#26040 closed Bug (invalid)

Streaming Large CSV Files Example Incorrect

Reported by: Philip Zerull Owned by: nobody
Component: Documentation Version: 1.8
Severity: Normal Keywords: csv streaming documentation bug
Cc: berker.peksag@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi everyone,

The documentation has an example of how to stream large CSV files.

https://docs.djangoproject.com/en/1.8/howto/outputting-csv/#streaming-large-csv-files

This is great but unfortunately the solution is incorrect (at least in django 1.8 using python 3.4).

Per the documentation for the StreamingHTTPResponse class "It should be given an iterator that yields strings as content."

but csvwriter.writerow returns None, not the result of the file.write call of the file passed to the csvwriter. The Echo class provided in the example was a good idea but it doesn't appear to work.

An alternative solution that does work would be:

def streaming_csv_writer(rows_to_output):
    memory_file = StringIO()
    writer = csv.writer(memory_file)
    for row in rows_to_output:
        writer.writerow(row)
        memory_file.seek(0)
        yield memory_file.read()
        memory_file.truncate(0)

response = StreamingHTTPResponse(streaming_csv_writer(rows), ...)

I'm happy to patch this myself but I wanted to discuss it first before writing the patch to get some additional opinions and to try to discover a bit of the history of this documentation example (because I have a feeling it must have worked at some time in the past).

Django is a great framework and I'm truly grateful to the maintainers and contributors to the project. You folks rock!

Change History (8)

comment:1 by Philip Zerull, 8 years ago

as an additional note: The documentation for this example is the same in django 1.8 and 1.9. I have not yet confirmed that this issue exists for python2

comment:2 by Simon Charette, 8 years ago

Triage Stage: UnreviewedAccepted

I confirm that the issue also exists on Python 2, csv.writer.writerow always return None.

The initial ticket suggested using a similar implementation to what's proposed here (#21179) but it was deemed unpythonic.

comment:3 by Claude Paroz, 8 years ago

When you say it doesn't work, is it that the response is empty or simply that the response is not streamed?

comment:4 by Emett Speer, 8 years ago

Seeing how we are working with "large" amounts of data wouldn't it make it be a good idea to look at heapq to store the list. You can use the heapq library inside generators in Python as well so it could even speed things up a bit more and its supported in all supported versions of Python.

comment:5 by Berker Peksag, 8 years ago

Cc: berker.peksag@… added

Perhaps the example could be removed and the first paragraph could be updated to mention about using Python generators?

Alternatively, the example could be changed to read from a large CSV file.

comment:6 by Berker Peksag, 8 years ago

Note that there is an open issue about changing the return value of DictWriter.writeheader() at Python issue tracker: https://bugs.python.org/issue27497 The example at https://docs.djangoproject.com/en/dev/howto/outputting-csv/#streaming-large-csv-files was also mentioned in that discussion.

comment:7 by Tim Graham, 8 years ago

The report never said why the example doesn't work. The example works for me on both Python 2.7 and Python 3.5, at least the CSV output looks fine. Is the problem that the response isn't streamed? If so, how do you test that?

comment:8 by Daniel Hepper, 5 years ago

Resolution: invalid
Status: newclosed

As Tim Graham noted, the example does in fact work, so I will close this as invalid.

The key in the example is the Echo class:

If you look at the source of writer.writerow, you can see that it calls writeline and returns its result. writeline is a reference to the write method of the file object passed to writer on instantiation. Now, the write method of a file object does indeed not return anything, but the write method of the Echo object used in the example does.

Note: See TracTickets for help on using tickets.
Back to Top