Opened 8 years ago

Closed 4 years ago

#9632 closed Cleanup/optimization (duplicate)

File.chunks contains potentially expensive size operation

Reported by: psagers Owned by: nobody
Component: File uploads/storage Version: 1.0
Severity: Normal Keywords: file
Cc: Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


django.core.files.base.File's chunk method uses the file's size to determine how many chunks to return. Retrieving the size of a file is very fast for simple files, but may be very expensive for compressed files and other storage mechanisms. And in this case, it's completely unnecessary. Replacing the size-based loop with a simple check for the end of the stream has fewer dependencies and avoids a potentially expensive operation.

To cite one practical example, if the file is stored as a compressed file on disk, the current implementation will result in either decompressing the entire file twice or caching the entire decompressed file in memory. Removing the size dependency avoids both.

Attachments (1)

chunks.diff (727 bytes) - added by psagers 8 years ago.

Download all attachments as: .zip

Change History (11)

Changed 8 years ago by psagers

comment:1 Changed 7 years ago by jacob

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

comment:2 Changed 6 years ago by adamnelson

We need a design decision on this - the patch looks fine.

comment:3 Changed 6 years ago by adamnelson

Also this ticket stating that the chunk size isn't always honored at #12157

comment:4 Changed 6 years ago by IanLewis

+1 for removing the size() check in chunks(). It tends to break things when wrapping non-file-system file-like-objects such as StringIO()

comment:5 Changed 6 years ago by IanLewis

However, that said, this patch might pose a problem for file-like-objects that should not be over-read socket connection. The patch probably should honor the file size if it exists.

comment:6 Changed 6 years ago by ikelly

See also #13901.

comment:7 Changed 5 years ago by lukeplant

  • Severity set to Normal
  • Type set to Cleanup/optimization

comment:8 Changed 5 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:9 Changed 5 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:10 Changed 4 years ago by claudep

  • Resolution set to duplicate
  • Status changed from new to closed

When fixing #15644, the chunks implementation has been changed to not depend on size any more (r17871). It remains to be seen if comment:5 (over-read socket connection) might be true, triggering new bug reports. Real use-cases welcome...

Note: See TracTickets for help on using tickets.
Back to Top