Opened 18 years ago

Closed 18 years ago

Last modified 17 years ago

#1484 closed defect (duplicate)

[patch] files uploads should be streamed to temporary files

Reported by: anonymous Owned by: Adrian Holovaty
Component: Core (Other) Version: dev
Severity: major Keywords:
Cc: Maniac@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In the current implementation the content of uploaded files is stored in the request.FILES dictionary.

This makes Django ill suited for any application where one might need to upload files beyond trivial sizes.

The content of the uploaded files should be stored in a temporary files and only the handle to these should be referenced in the request.FILES dictionary.

best,

i.

Attachments (14)

1484.trunk.diff (5.9 KB ) - added by Maniac <Maniac@…> 18 years ago.
Patch for trunk
1484.m-r.diff (5.8 KB ) - added by Maniac <Maniac@…> 18 years ago.
Patch for magic-removal branch
1484.trunk.2.diff (5.9 KB ) - added by Maniac <Maniac@…> 18 years ago.
More correct patch for trunk
1484.m-r.2.diff (5.9 KB ) - added by Maniac <Maniac@…> 18 years ago.
More correct patch for magic-removal
1484.trunk.3.diff (6.1 KB ) - added by Maniac <Maniac@…> 18 years ago.
Patch #3 for trunk
1484.m-r.3.diff (6.1 KB ) - added by Maniac <Maniac@…> 18 years ago.
Patch #3 for magic-removal
1484.trunk.4.diff (6.5 KB ) - added by Maniac <Maniac@…> 18 years ago.
Patch 4 for trunk
1484.m-r.4.diff (6.4 KB ) - added by Maniac <Maniac@…> 18 years ago.
Patch 4 for magic-removal
1484.trunk.5.diff (6.5 KB ) - added by Maniac <Maniac@…> 18 years ago.
Patch 5 for trunk: updated for line count
1484.m-r.5.diff (6.4 KB ) - added by Maniac <Maniac@…> 18 years ago.
Patch 5 for magic-removal: updated for line count
1484.m-r.6.diff (6.4 KB ) - added by Maniac <Maniac@…> 18 years ago.
Patch 5 for magic-removal: corrected settings import
3013-streaming-file.diff (7.4 KB ) - added by [530] 18 years ago.
Updated for new trunk
3013-streaming-file_improved.diff (10.4 KB ) - added by [530] 18 years ago.
Improved
3013-streaming-file_rfc822.diff (13.1 KB ) - added by [530] 18 years ago.
Fixed readline in rfc822

Download all attachments as: .zip

Change History (30)

comment:1 by Maniac <Maniac@…>, 18 years ago

Cc: Maniac@… added

I'm starting to implement it for my project. I'll submit a patch if all goes well.

BTW, was there a special reason to parse incoming POST data with email package? cgi.FieldStorage seems already smart enough to put binary contents in temp files. I plan to use it.

comment:2 by Maniac <Maniac@…>, 18 years ago

I have it basically working in my local tree (though not yet suitable for review). To finish it I'd like to hear your opinions:

  • Temp files are read from stdin in 8K chunks which seems to me too conservative (and slow). What would be the reasonable amount? I would say 32K but it's based only on some dusty habits :-)
  • I don't plan to leave 'read everything in memory' behavior for file uploads in place. However if it makes sense to process file uploads in memory I could make a setting switching between these behaviors.

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.trunk.diff added

Patch for trunk

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.m-r.diff added

Patch for magic-removal branch

comment:3 by Maniac <Maniac@…>, 18 years ago

Summary: files uploads should be streamed to temporary files[patch] files uploads should be streamed to temporary files

Here are patches for trunk and magic-removal. It also eliminates at least one copy of the same data from memory and works better than previous parser even without storing files in temp files.

Now items in request.FILES have are accesible as follows:

filefilename - filename, as it was
filecontent-type - content type, as it was
filefile - file-like object (StringIO or TempFile internally)
filecontent - file content that is filled lazily on first access

There is a new setting: STORE_UPLOAD_ON_DISK (feel free to rename it) which is False by default

In other words it's all backwards compatible.

comment:4 by anonymous, 18 years ago

2 days from the report to the solution ... this is awesome ... really saves our app ... very very impressed

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.trunk.2.diff added

More correct patch for trunk

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.m-r.2.diff added

More correct patch for magic-removal

comment:5 by Maniac <Maniac@…>, 18 years ago

I've made an error in one place that made multipart form submission work only when there are no more fields other than file upload. Corrected patches attached.

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.trunk.3.diff added

Patch #3 for trunk

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.m-r.3.diff added

Patch #3 for magic-removal

comment:6 by Maniac <Maniac@…>, 18 years ago

Found some more bugs working with these patches. New variants also happen to be more reviewable (i.e. diff blocks are less mixed)

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.trunk.4.diff added

Patch 4 for trunk

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.m-r.4.diff added

Patch 4 for magic-removal

comment:7 by Maniac <Maniac@…>, 18 years ago

New patch fix 2 things:

  • it now works when multiple file upload controls have identical names
  • it survives request.FILES.copy() (file objects don't get copied so it now loads content before copying)

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.trunk.5.diff added

Patch 5 for trunk: updated for line count

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.m-r.5.diff added

Patch 5 for magic-removal: updated for line count

by Maniac <Maniac@…>, 18 years ago

Attachment: 1484.m-r.6.diff added

Patch 5 for magic-removal: corrected settings import

comment:8 by Maniac <Maniac@…>, 18 years ago

I just converted my app to magic-removal (that was most certainly loads of fun!) and found out that I messed up this patch with direct import of constant from settings. Attached is new version that actually works (now tested for real).

Sorry...

by [530], 18 years ago

Attachment: 3013-streaming-file.diff added

Updated for new trunk

comment:9 by [530], 18 years ago

Version: SVN

Made FieldStorage read in chuncks of 64k, still needs fixing for related objects in MultiValueDict.

Error trace if you are intrested.

{{
Traceback (most recent call last):
File "/opt/apache/htdocs/python/bigfile/django/core/handlers/base.py" in get_response

  1. response = callback(request, *callback_args, callback_kwargs)

File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/decorators.py" in _checklogin

  1. return view_func(request, *args, kwargs)

File "/opt/apache/htdocs/python/bigfile/django/views/decorators/cache.py" in _wrapped_view_func

  1. response = view_func(request, *args, kwargs)

File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/main.py" in add_stage

  1. errors = manipulator.get_validation_errors(new_data)

File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in get_validation_errors

  1. errors.update(field.get_validation_errors(new_data))

File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in get_validation_errors

  1. self.run_validator(new_data, validator)

File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in run_validator

  1. if self.is_required or new_data.get(self.field_name, False) or hasattr(validator, 'always_test'):

File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in get

  1. val = self[key]

File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in getitem

  1. raise MultiValueDictKeyError, "Key %r not found in %r" % (key, self)

File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in repr

  1. return "<MultiValueDict: %s>" % dict.repr(self)

MemoryError at /admin/storfil/filelist/add/

}}

Tested with a 500meg file.

comment:10 by [530], 18 years ago

Traceback (most recent call last):
File "/opt/apache/htdocs/python/bigfile/django/core/handlers/base.py" in get_response
  74. response = callback(request, *callback_args, **callback_kwargs)
File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/decorators.py" in _checklogin
  54. return view_func(request, *args, **kwargs)
File "/opt/apache/htdocs/python/bigfile/django/views/decorators/cache.py" in _wrapped_view_func
  40. response = view_func(request, *args, **kwargs)
File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/main.py" in add_stage
  253. errors = manipulator.get_validation_errors(new_data)
File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in get_validation_errors
  58. errors.update(field.get_validation_errors(new_data))
File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in get_validation_errors
  351. self.run_validator(new_data, validator)
File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in run_validator
  337. if self.is_required or new_data.get(self.field_name, False) or hasattr(validator, 'always_test'):
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in get
  137. val = self[key]
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in __getitem__
  114. raise MultiValueDictKeyError, "Key %r not found in %r" % (key, self)
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in __repr__
  104. return "<MultiValueDict: %s>" % dict.__repr__(self)

  MemoryError at /admin/storfil/filelist/add/

by [530], 18 years ago

Improved

comment:11 by [530], 18 years ago

3013-streaming-file_improved.diff

Reading bouandries done using 64kb chunks in parse_file_uploads to avoid reading a big file on one.

Validation on filefields done using size of temporary file.

File being moved in _save_FIELD_file instead of written from memory.

Removed settings since content is still accessible as new_data[field_name]content

Code cleanup still needed, will do this if the patch is acceptable.

comment:12 by Maniac <Maniac@…>, 18 years ago

In fact I'm against changing the whole interface of working with file content to file-like objects because it's not backwards compatible. Or at least not in this ticket since it decreases its chances to be commited.

comment:13 by anonymous, 18 years ago

The problem is you need all those changes I did to be able to upload a large file without a memoryerror, without them this patch has no purpose. But I could rework it to use the setting.

by [530], 18 years ago

Fixed readline in rfc822

comment:14 by anonymous, 18 years ago

Sure, it has purpose! I use it every day since I've written it :-). Your changes are required only if you use manipulators to validate file contents. But it's not required in Django. And in some cases you just don't need a manipulator.

In other words this bug was intended to create a possibility to store uploaded files on disk. But not to convert automatic manipulators, admin and validation to this approach. This is better to have in another ticket because it's easier to review.

comment:15 by [530], 18 years ago

Ok, starting new ticket.

comment:16 by Jacob, 18 years ago

Resolution: duplicate
Status: newclosed

#2070 superseeds this ticket.

Note: See TracTickets for help on using tickets.
Back to Top