Opened 16 years ago
Closed 10 years ago
#8149 closed Cleanup/optimization (fixed)
UploadedFile doesn't iterate over lines with \r line endings
Reported by: | Tai Lee | Owned by: | nobody |
---|---|---|---|
Component: | File uploads/storage | Version: | dev |
Severity: | Normal | Keywords: | UploadedFile |
Cc: | real.human@…, timograham@…, jon.dufresne@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
UploadedFile.__iter__
loops over chunks
and converts each chunk to a StringIO
object and then iterates over that. Iterating over StringIO
objects only splits on \n
, and I'm not sure why we need StringIO
at all. Couldn't we just do for line in chunk.splitlines()
(which does split on \r
and \n
).
Attachments (2)
Change History (20)
by , 16 years ago
Attachment: | UploadedFile-r8225.diff added |
---|
comment:1 by , 16 years ago
Has patch: | set |
---|
comment:2 by , 16 years ago
Cc: | added |
---|
comment:3 by , 16 years ago
Looks like using splitlines
will strip the line endings and break binary uploads. Here's a new patch that iterates through a call to re.findall
with a simple regular expression instead of a StringIO
object.
comment:4 by , 16 years ago
Here's a use-case for this change. Upload a CSV file (which can have \r
or \r\n
line endings), and pass the UploadedFile
object to csv.reader
and you get Error: newline inside string
when you try to iterate over it. When working with real files on the local file system you can open them with universal newline support, which is not available with StringIO
or cStringIO
objects.
comment:5 by , 16 years ago
Triage Stage: | Unreviewed → Accepted |
---|
by , 16 years ago
Attachment: | UploadedFile-r8226.diff added |
---|
Updated for r8654 and fixed regular expression to not match empty strings.
comment:6 by , 16 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Universal newline handling only makes sense for files generated/stored on the local system, where the newline convention is known. The UploadedFile
class, pretty much by definition, is for data that has come from, and been generated on, a remote system that we know nothing about. So trying to use the appropriate line-ending handling is an unattainable goal for that situation. #8656 is the ticket to make sure this is documented.
comment:8 by , 12 years ago
Component: | Forms → File uploads/storage |
---|---|
Easy pickings: | unset |
Resolution: | wontfix |
Severity: | → Normal |
Status: | closed → reopened |
Type: | → Uncategorized |
UI/UX: | unset |
I'm re-opening this ticket because I have found a much simpler implementation, and this is still a problem for anyone who needs to parse uploaded text files line-by-line (e.g. CSV) when they can't know what line terminators end-users are using in their files. Excel 2011 on OS X is at least one commonly used program that still saves CSV data with \r
line terminators by default.
Originally, there was a discussion between Malcolm and Jacob on IRC and the strongest objection at the time to fixing this was the ugly regular expression in my previous patch. This was also during a busy time, and it was decided that this should be a documentation issue.
I don't think the explanation given above when this ticket was wontfixed actually makes much sense. Uploaded file data which has come from and been generated on a remote system that we know nothing about is precisely the reason why we *need* universal newline support when iterating those files line-by-line.
I'm not sure how I missed it before, but when I said that we couldn't just use splitlines()
because it consumed the line terminators, it seems I was wrong. splitlines()
has a keepends
argument that will actually do exactly what we need without any ugly regular expressions. Perhaps this wasn't the case back then, when we still had to support Python 2.3 and 2.4. I couldn't find confirmation in the Python 2.3 and 2.4 docs, but I did confirm that splitlines(True)
works in Python 2.5 which is now our minimum version number.
I've fixed this in a branch on GitHub, and opened a pull request. All tests pass under OS X Lion with Python 2.7.2 and SQLite.
https://github.com/thirstydigital/django/tree/tickets/8149-universal-newline-uploads
comment:10 by , 12 years ago
Sorry. Don't know why I thought this didn't need tests. New pull request with test.
comment:12 by , 12 years ago
Type: | Uncategorized → Cleanup/optimization |
---|
comment:13 by , 12 years ago
Status: | reopened → new |
---|
comment:14 by , 12 years ago
Cc: | added |
---|---|
Needs tests: | unset |
Patch needs improvement: | set |
The test doesn't pass on Python 3.
comment:15 by , 10 years ago
Cc: | added |
---|
I've been running into this issue as well. Most notably when parsing uploaded files as CSV from Mac.
I added a submitted a pull request that works with Python 2 and 3: https://github.com/django/django/pull/3291
comment:16 by , 10 years ago
Patch needs improvement: | unset |
---|
Sorry in advance if this is incorrect usage of this flag.
I am removing "Patch needs improvement" as my latest pull request does work with Python 3 and there are no outstanding requests from a review.
comment:17 by , 10 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
@jdufresne - that's correct usage. In fact, since tchaumeny reviewed it, he could mark the ticket as "ready for checkin".
comment:18 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
fix iteration of UploadedFile objects on newlines.