Code

Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#1484 closed defect (duplicate)

[patch] files uploads should be streamed to temporary files

Reported by: anonymous Owned by: adrian
Component: Core (Other) Version: master
Severity: major Keywords:
Cc: Maniac@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

In the current implementation the content of uploaded files is stored in the request.FILES dictionary.

This makes Django ill suited for any application where one might need to upload files beyond trivial sizes.

The content of the uploaded files should be stored in a temporary files and only the handle to these should be referenced in the request.FILES dictionary.

best,

i.

Attachments (14)

1484.trunk.diff (5.9 KB) - added by Maniac <Maniac@…> 8 years ago.
Patch for trunk
1484.m-r.diff (5.8 KB) - added by Maniac <Maniac@…> 8 years ago.
Patch for magic-removal branch
1484.trunk.2.diff (5.9 KB) - added by Maniac <Maniac@…> 8 years ago.
More correct patch for trunk
1484.m-r.2.diff (5.9 KB) - added by Maniac <Maniac@…> 8 years ago.
More correct patch for magic-removal
1484.trunk.3.diff (6.1 KB) - added by Maniac <Maniac@…> 8 years ago.
Patch #3 for trunk
1484.m-r.3.diff (6.1 KB) - added by Maniac <Maniac@…> 8 years ago.
Patch #3 for magic-removal
1484.trunk.4.diff (6.5 KB) - added by Maniac <Maniac@…> 8 years ago.
Patch 4 for trunk
1484.m-r.4.diff (6.4 KB) - added by Maniac <Maniac@…> 8 years ago.
Patch 4 for magic-removal
1484.trunk.5.diff (6.5 KB) - added by Maniac <Maniac@…> 8 years ago.
Patch 5 for trunk: updated for line count
1484.m-r.5.diff (6.4 KB) - added by Maniac <Maniac@…> 8 years ago.
Patch 5 for magic-removal: updated for line count
1484.m-r.6.diff (6.4 KB) - added by Maniac <Maniac@…> 8 years ago.
Patch 5 for magic-removal: corrected settings import
3013-streaming-file.diff (7.4 KB) - added by [530] 8 years ago.
Updated for new trunk
3013-streaming-file_improved.diff (10.4 KB) - added by [530] 8 years ago.
Improved
3013-streaming-file_rfc822.diff (13.1 KB) - added by [530] 8 years ago.
Fixed readline in rfc822

Download all attachments as: .zip

Change History (30)

comment:1 Changed 8 years ago by Maniac <Maniac@…>

  • Cc Maniac@… added

I'm starting to implement it for my project. I'll submit a patch if all goes well.

BTW, was there a special reason to parse incoming POST data with email package? cgi.FieldStorage seems already smart enough to put binary contents in temp files. I plan to use it.

comment:2 Changed 8 years ago by Maniac <Maniac@…>

I have it basically working in my local tree (though not yet suitable for review). To finish it I'd like to hear your opinions:

  • Temp files are read from stdin in 8K chunks which seems to me too conservative (and slow). What would be the reasonable amount? I would say 32K but it's based only on some dusty habits :-)
  • I don't plan to leave 'read everything in memory' behavior for file uploads in place. However if it makes sense to process file uploads in memory I could make a setting switching between these behaviors.

Changed 8 years ago by Maniac <Maniac@…>

Patch for trunk

Changed 8 years ago by Maniac <Maniac@…>

Patch for magic-removal branch

comment:3 Changed 8 years ago by Maniac <Maniac@…>

  • Summary changed from files uploads should be streamed to temporary files to [patch] files uploads should be streamed to temporary files

Here are patches for trunk and magic-removal. It also eliminates at least one copy of the same data from memory and works better than previous parser even without storing files in temp files.

Now items in request.FILES have are accesible as follows:

filefilename? - filename, as it was
filecontent-type? - content type, as it was
filefile? - file-like object (StringIO or TempFile internally)
filecontent? - file content that is filled lazily on first access

There is a new setting: STORE_UPLOAD_ON_DISK (feel free to rename it) which is False by default

In other words it's all backwards compatible.

comment:4 Changed 8 years ago by anonymous

2 days from the report to the solution ... this is awesome ... really saves our app ... very very impressed

Changed 8 years ago by Maniac <Maniac@…>

More correct patch for trunk

Changed 8 years ago by Maniac <Maniac@…>

More correct patch for magic-removal

comment:5 Changed 8 years ago by Maniac <Maniac@…>

I've made an error in one place that made multipart form submission work only when there are no more fields other than file upload. Corrected patches attached.

Changed 8 years ago by Maniac <Maniac@…>

Patch #3 for trunk

Changed 8 years ago by Maniac <Maniac@…>

Patch #3 for magic-removal

comment:6 Changed 8 years ago by Maniac <Maniac@…>

Found some more bugs working with these patches. New variants also happen to be more reviewable (i.e. diff blocks are less mixed)

Changed 8 years ago by Maniac <Maniac@…>

Patch 4 for trunk

Changed 8 years ago by Maniac <Maniac@…>

Patch 4 for magic-removal

comment:7 Changed 8 years ago by Maniac <Maniac@…>

New patch fix 2 things:

  • it now works when multiple file upload controls have identical names
  • it survives request.FILES.copy() (file objects don't get copied so it now loads content before copying)

Changed 8 years ago by Maniac <Maniac@…>

Patch 5 for trunk: updated for line count

Changed 8 years ago by Maniac <Maniac@…>

Patch 5 for magic-removal: updated for line count

Changed 8 years ago by Maniac <Maniac@…>

Patch 5 for magic-removal: corrected settings import

comment:8 Changed 8 years ago by Maniac <Maniac@…>

I just converted my app to magic-removal (that was most certainly loads of fun!) and found out that I messed up this patch with direct import of constant from settings. Attached is new version that actually works (now tested for real).

Sorry...

Changed 8 years ago by [530]

Updated for new trunk

comment:9 Changed 8 years ago by [530]

  • Version set to SVN

Made FieldStorage read in chuncks of 64k, still needs fixing for related objects in MultiValueDict.

Error trace if you are intrested.

{{
Traceback (most recent call last):
File "/opt/apache/htdocs/python/bigfile/django/core/handlers/base.py" in get_response

  1. response = callback(request, *callback_args, callback_kwargs)

File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/decorators.py" in _checklogin

  1. return view_func(request, *args, kwargs)

File "/opt/apache/htdocs/python/bigfile/django/views/decorators/cache.py" in _wrapped_view_func

  1. response = view_func(request, *args, kwargs)

File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/main.py" in add_stage

  1. errors = manipulator.get_validation_errors(new_data)

File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in get_validation_errors

  1. errors.update(field.get_validation_errors(new_data))

File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in get_validation_errors

  1. self.run_validator(new_data, validator)

File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in run_validator

  1. if self.is_required or new_data.get(self.field_name, False) or hasattr(validator, 'always_test'):

File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in get

  1. val = self[key]

File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in getitem

  1. raise MultiValueDictKeyError, "Key %r not found in %r" % (key, self)

File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in repr

  1. return "<MultiValueDict: %s>" % dict.repr(self)

MemoryError at /admin/storfil/filelist/add/

}}

Tested with a 500meg file.

comment:10 Changed 8 years ago by [530]

Traceback (most recent call last):
File "/opt/apache/htdocs/python/bigfile/django/core/handlers/base.py" in get_response
  74. response = callback(request, *callback_args, **callback_kwargs)
File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/decorators.py" in _checklogin
  54. return view_func(request, *args, **kwargs)
File "/opt/apache/htdocs/python/bigfile/django/views/decorators/cache.py" in _wrapped_view_func
  40. response = view_func(request, *args, **kwargs)
File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/main.py" in add_stage
  253. errors = manipulator.get_validation_errors(new_data)
File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in get_validation_errors
  58. errors.update(field.get_validation_errors(new_data))
File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in get_validation_errors
  351. self.run_validator(new_data, validator)
File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in run_validator
  337. if self.is_required or new_data.get(self.field_name, False) or hasattr(validator, 'always_test'):
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in get
  137. val = self[key]
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in __getitem__
  114. raise MultiValueDictKeyError, "Key %r not found in %r" % (key, self)
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in __repr__
  104. return "<MultiValueDict: %s>" % dict.__repr__(self)

  MemoryError at /admin/storfil/filelist/add/

Changed 8 years ago by [530]

Improved

comment:11 Changed 8 years ago by [530]

3013-streaming-file_improved.diff

Reading bouandries done using 64kb chunks in parse_file_uploads to avoid reading a big file on one.

Validation on filefields done using size of temporary file.

File being moved in _save_FIELD_file instead of written from memory.

Removed settings since content is still accessible as new_data[field_name]content?

Code cleanup still needed, will do this if the patch is acceptable.

comment:12 Changed 8 years ago by Maniac <Maniac@…>

In fact I'm against changing the whole interface of working with file content to file-like objects because it's not backwards compatible. Or at least not in this ticket since it decreases its chances to be commited.

comment:13 Changed 8 years ago by anonymous

The problem is you need all those changes I did to be able to upload a large file without a memoryerror, without them this patch has no purpose. But I could rework it to use the setting.

Changed 8 years ago by [530]

Fixed readline in rfc822

comment:14 Changed 8 years ago by anonymous

Sure, it has purpose! I use it every day since I've written it :-). Your changes are required only if you use manipulators to validate file contents. But it's not required in Django. And in some cases you just don't need a manipulator.

In other words this bug was intended to create a possibility to store uploaded files on disk. But not to convert automatic manipulators, admin and validation to this approach. This is better to have in another ticket because it's easier to review.

comment:15 Changed 8 years ago by [530]

Ok, starting new ticket.

comment:16 Changed 8 years ago by jacob

  • Resolution set to duplicate
  • Status changed from new to closed

#2070 superseeds this ticket.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.