Django

Code

Ticket #1484 (closed: duplicate)

Opened 3 years ago

Last modified 3 years ago

[patch] files uploads should be streamed to temporary files

Reported by: anonymous Assigned to: adrian
Milestone: Component: Core framework
Version: SVN Keywords:
Cc: Maniac@SoftwareManiacs.Org Triage Stage: Unreviewed
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

In the current implementation the content of uploaded files is stored in the request.FILES dictionary.

This makes Django ill suited for any application where one might need to upload files beyond trivial sizes.

The content of the uploaded files should be stored in a temporary files and only the handle to these should be referenced in the request.FILES dictionary.

best,

i.

Attachments

1484.trunk.diff (5.9 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 03/10/06 07:19:18.
Patch for trunk
1484.m-r.diff (5.8 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 03/10/06 07:19:51.
Patch for magic-removal branch
1484.trunk.2.diff (5.9 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 03/16/06 14:00:13.
More correct patch for trunk
1484.m-r.2.diff (5.9 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 03/16/06 14:00:44.
More correct patch for magic-removal
1484.trunk.3.diff (6.1 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 04/02/06 13:13:52.
Patch #3 for trunk
1484.m-r.3.diff (6.1 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 04/02/06 13:14:39.
Patch #3 for magic-removal
1484.trunk.4.diff (6.5 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 04/06/06 06:56:25.
Patch 4 for trunk
1484.m-r.4.diff (6.4 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 04/06/06 06:57:43.
Patch 4 for magic-removal
1484.trunk.5.diff (6.5 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 04/10/06 04:37:15.
Patch 5 for trunk: updated for line count
1484.m-r.5.diff (6.4 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 04/10/06 04:37:52.
Patch 5 for magic-removal: updated for line count
1484.m-r.6.diff (6.4 kB) - added by Maniac <Maniac@SoftwareManiacs.Org> on 04/13/06 13:43:04.
Patch 5 for magic-removal: corrected settings import
3013-streaming-file.diff (7.4 kB) - added by [530] on 05/31/06 11:13:14.
Updated for new trunk
3013-streaming-file_improved.diff (10.4 kB) - added by [530] on 06/01/06 14:25:27.
Improved
3013-streaming-file_rfc822.diff (13.1 kB) - added by [530] on 06/02/06 04:26:22.
Fixed readline in rfc822

Change History

03/09/06 12:41:38 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • cc set to Maniac@SoftwareManiacs.Org.

I'm starting to implement it for my project. I'll submit a patch if all goes well.

BTW, was there a special reason to parse incoming POST data with email package? cgi.FieldStorage? seems already smart enough to put binary contents in temp files. I plan to use it.

03/10/06 02:19:37 changed by Maniac <Maniac@SoftwareManiacs.Org>

I have it basically working in my local tree (though not yet suitable for review). To finish it I'd like to hear your opinions:

  • Temp files are read from stdin in 8K chunks which seems to me too conservative (and slow). What would be the reasonable amount? I would say 32K but it's based only on some dusty habits :-)
  • I don't plan to leave 'read everything in memory' behavior for file uploads in place. However if it makes sense to process file uploads in memory I could make a setting switching between these behaviors.

03/10/06 07:19:18 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.trunk.diff added.

Patch for trunk

03/10/06 07:19:51 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.m-r.diff added.

Patch for magic-removal branch

03/10/06 07:26:32 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • summary changed from files uploads should be streamed to temporary files to [patch] files uploads should be streamed to temporary files.

Here are patches for trunk and magic-removal. It also eliminates at least one copy of the same data from memory and works better than previous parser even without storing files in temp files.

Now items in request.FILES have are accesible as follows:

filefilename? - filename, as it was filecontent-type? - content type, as it was filefile? - file-like object (StringIO or TempFile? internally) filecontent? - file content that is filled lazily on first access

There is a new setting: STORE_UPLOAD_ON_DISK (feel free to rename it) which is False by default

In other words it's all backwards compatible.

03/11/06 09:32:56 changed by anonymous

2 days from the report to the solution ... this is awesome ... really saves our app ... very very impressed

03/16/06 14:00:13 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.trunk.2.diff added.

More correct patch for trunk

03/16/06 14:00:44 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.m-r.2.diff added.

More correct patch for magic-removal

03/16/06 14:03:01 changed by Maniac <Maniac@SoftwareManiacs.Org>

I've made an error in one place that made multipart form submission work only when there are no more fields other than file upload. Corrected patches attached.

04/02/06 13:13:52 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.trunk.3.diff added.

Patch #3 for trunk

04/02/06 13:14:39 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.m-r.3.diff added.

Patch #3 for magic-removal

04/02/06 13:17:29 changed by Maniac <Maniac@SoftwareManiacs.Org>

Found some more bugs working with these patches. New variants also happen to be more reviewable (i.e. diff blocks are less mixed)

04/06/06 06:56:25 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.trunk.4.diff added.

Patch 4 for trunk

04/06/06 06:57:43 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.m-r.4.diff added.

Patch 4 for magic-removal

04/06/06 07:00:10 changed by Maniac <Maniac@SoftwareManiacs.Org>

New patch fix 2 things: - it now works when multiple file upload controls have identical names - it survives request.FILES.copy() (file objects don't get copied so it now loads content before copying)

04/10/06 04:37:15 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.trunk.5.diff added.

Patch 5 for trunk: updated for line count

04/10/06 04:37:52 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.m-r.5.diff added.

Patch 5 for magic-removal: updated for line count

04/13/06 13:43:04 changed by Maniac <Maniac@SoftwareManiacs.Org>

  • attachment 1484.m-r.6.diff added.

Patch 5 for magic-removal: corrected settings import

04/13/06 13:46:00 changed by Maniac <Maniac@SoftwareManiacs.Org>

I just converted my app to magic-removal (that was most certainly loads of fun!) and found out that I messed up this patch with direct import of constant from settings. Attached is new version that actually works (now tested for real).

Sorry...

05/31/06 11:13:14 changed by [530]

  • attachment 3013-streaming-file.diff added.

Updated for new trunk

05/31/06 11:17:10 changed by [530]

  • version set to SVN.

Made FieldStorage? read in chuncks of 64k, still needs fixing for related objects in MultiValueDict?.

Error trace if you are intrested.

{{ Traceback (most recent call last): File "/opt/apache/htdocs/python/bigfile/django/core/handlers/base.py" in get_response

  1. response = callback(request, *callback_args, **callback_kwargs)

File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/decorators.py" in _checklogin

  1. return view_func(request, *args, **kwargs)

File "/opt/apache/htdocs/python/bigfile/django/views/decorators/cache.py" in _wrapped_view_func

  1. response = view_func(request, *args, **kwargs)

File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/main.py" in add_stage

  1. errors = manipulator.get_validation_errors(new_data)

File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in get_validation_errors

  1. errors.update(field.get_validation_errors(new_data))

File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in get_validation_errors

  1. self.run_validator(new_data, validator)

File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in run_validator

  1. if self.is_required or new_data.get(self.field_name, False) or hasattr(validator, 'always_test'):

File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in get

  1. val = self[key]

File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in getitem

  1. raise MultiValueDictKeyError?, "Key %r not found in %r" % (key, self)

File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in repr

  1. return "<MultiValueDict?: %s>" % dict.repr(self)

MemoryError? at /admin/storfil/filelist/add/

}}

Tested with a 500meg file.

05/31/06 12:42:56 changed by [530]

Traceback (most recent call last):
File "/opt/apache/htdocs/python/bigfile/django/core/handlers/base.py" in get_response
  74. response = callback(request, *callback_args, **callback_kwargs)
File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/decorators.py" in _checklogin
  54. return view_func(request, *args, **kwargs)
File "/opt/apache/htdocs/python/bigfile/django/views/decorators/cache.py" in _wrapped_view_func
  40. response = view_func(request, *args, **kwargs)
File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/main.py" in add_stage
  253. errors = manipulator.get_validation_errors(new_data)
File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in get_validation_errors
  58. errors.update(field.get_validation_errors(new_data))
File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in get_validation_errors
  351. self.run_validator(new_data, validator)
File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in run_validator
  337. if self.is_required or new_data.get(self.field_name, False) or hasattr(validator, 'always_test'):
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in get
  137. val = self[key]
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in __getitem__
  114. raise MultiValueDictKeyError, "Key %r not found in %r" % (key, self)
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in __repr__
  104. return "<MultiValueDict: %s>" % dict.__repr__(self)

  MemoryError at /admin/storfil/filelist/add/

06/01/06 14:25:27 changed by [530]

  • attachment 3013-streaming-file_improved.diff added.

Improved

06/01/06 14:34:40 changed by [530]

3013-streaming-file_improved.diff

Reading bouandries done using 64kb chunks in parse_file_uploads to avoid reading a big file on one.

Validation on filefields done using size of temporary file.

File being moved in _save_FIELD_file instead of written from memory.

Removed settings since content is still accessible as new_data[field_name]content?

Code cleanup still needed, will do this if the patch is acceptable.

06/02/06 02:08:23 changed by Maniac <Maniac@SoftwareManiacs.Org>

In fact I'm against changing the whole interface of working with file content to file-like objects because it's not backwards compatible. Or at least not in this ticket since it decreases its chances to be commited.

06/02/06 03:45:52 changed by anonymous

The problem is you need all those changes I did to be able to upload a large file without a memoryerror, without them this patch has no purpose. But I could rework it to use the setting.

06/02/06 04:26:22 changed by [530]

  • attachment 3013-streaming-file_rfc822.diff added.

Fixed readline in rfc822

06/02/06 06:55:54 changed by anonymous

Sure, it has purpose! I use it every day since I've written it :-). Your changes are required only if you use manipulators to validate file contents. But it's not required in Django. And in some cases you just don't need a manipulator.

In other words this bug was intended to create a possibility to store uploaded files on disk. But not to convert automatic manipulators, admin and validation to this approach. This is better to have in another ticket because it's easier to review.

06/02/06 07:46:40 changed by [530]

Ok, starting new ticket.

07/29/06 23:18:53 changed by jacob

  • status changed from new to closed.
  • resolution set to duplicate.

#2070 superseeds this ticket.


Add/Change #1484 ([patch] files uploads should be streamed to temporary files)




Change Properties
Action