Opened 18 years ago

Closed 16 years ago

Last modified 8 years ago

#2070 closed enhancement (fixed)

Large streaming uploads

Reported by: oyvind.saltvik@… Owned by: Jacob
Component: HTTP handling Version: dev
Severity: normal Keywords: streaming, upload, large, sprintsept14, feature
Cc: dsalvetti@…, matthias@…, oliver@…, Maniac@…, nesh@…, aenor.realm@…, gary.wilson@…, serialx.net@…, wonlay@…, lurker86@…, shaun@…, antonio.mele@…, herbert.poul@…, gabor@…, koebbe@…, axiak@…, jm.bugtracking@…, moritz.angermann@…, django@…, brosner@…, root.lastnode@…, akaihola+djtick@…, sam@…, trey@…, klaus.blindert@…, ville@…, reza@…, registrations@…, calvin@…, works@…, andrewbadr.etc@…, uptimebox@…, drfarina@…, beau@…, giuliani.v@…, prigun@…, zerok@…, johannes.beigel@…, david@…, django@…, mocksoul@…, schlaber@…, danpoe@…, django@…, django-2070@…, clintecker@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Based on code from ticket 1448.

Test for large file upload

from django.db import models

# Create your models here.
class FileList(models.Model):
    name = models.CharField(maxlength=255)
    email = models.EmailField()

    class Admin:
        pass

class AFile(models.Model):
    descr = models.CharField(maxlength=255,core=True)
    file = models.FileField(upload_to='files')
    inlist = models.ForeignKey(FileList,edit_inline=models.STACKED)

And two 500 megabyte files for upload.

Works with the patch with +20 mb above average memory and minimal cpu load usage by the httpd thread, file upload successful.

Without the patch httpd rages to +140 mb above average memory usage and over 60% cpu usage and fails miserably in the end.

Attachments (64)

3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id.diff (16.7 KB ) - added by [530] 18 years ago.
Now using X_PROGRESS_ID instead of X-Progress-Id, accepts any lower/uppercase/ - / _ /prefix variant
3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id_windowsfix.diff (16.9 KB ) - added by [Cha0S] 18 years ago.
windows temporary file locking fix
modpyton-ok-needs-fcgi-testing.3.diff (28.6 KB ) - added by Øyvind Saltvik <oyvind.saltvik@…> 17 years ago.
did not work without middleware, fixed
4459-streaming-file-upload.diff (25.9 KB ) - added by Joakim Sernbrant <serbaut@…> 17 years ago.
Simplified streaming uploads
4459-streaming-file-upload.2.diff (26.7 KB ) - added by Joakim Sernbrant <serbaut@…> 17 years ago.
Added FILE_UPLOAD_MIN_SIZE (default 100kb) to define minimum request size for streaming to disk. Propage exeptions. I'm not too happy with the names of the settings anymore :/
5065-streaming_file_upload_with_shutils.diff (26.6 KB ) - added by axiak@… 17 years ago.
I've updated the patch to include the shutils command and to work with [5065]. Please check to see if it works.
5065-streaming_file_upload_with_shutils_2.diff (27.5 KB ) - added by axiak@… 17 years ago.
Works in [5065], renamed settings variable, uses global settings, defaults STREAMING_MIN_POST_SIZE to .5MB (please test though!)
5070-streaming-file-upload.diff (26.7 KB ) - added by Øyvind Saltvik <oyvind@…> 17 years ago.
Updated to trunk, without changes
5078_streaming_file_upload_with_shutils_and_fallbacks.diff (28.1 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Usese shutils, but falls back.
5078_streaming_file_upload_with_shutils_and_fallbacks_2.diff (28.2 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Usese shutils, but falls back, this time it deletes the tmp file even when it falls back.
5078-streaming_file_upload_with_shutils_and_chunked_fallbacks.diff (28.4 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
This file uses python fallbacks, but using chunks to avoid loading the entire file into memory.
5079-streaming_file_upload_with_safe_file_move.diff (29.1 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Cleaned it up a bit. Moved file_move_safe into django.utils in case it should be used in future endeavors.
5089-streaming_file_upload_with_safe_file_move.diff (28.0 KB ) - added by Øyvind Saltvik <oyvind@…> 17 years ago.
Added missing else
5089-streaming_file_upload_with_safe_file_move.2.diff (29.1 KB ) - added by Øyvind Saltvik <oyvind@…> 17 years ago.
Forgot to add django/utils/file.py
5099_patch_for_streaming_uploads.diff (30.8 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Uses multiple mechanisms for determining the progress id.
5100_upload_request_meta.diff (33.8 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
This isn't important enough to get its own property in request...request.META now contains 'UPLOAD_PROGRESS_ID'
5100_forgot_to_revert_base.diff (32.5 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
I included a bit much in that last patch.
5100_file_upload_core.diff (34.1 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Meant to be cleaner, especially in light of #4165
5100_file_upload_core_with_middleware_hooks.diff (37.7 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Added middleware hooks
5100_file_upload_core_with_middleware_hooks_2.diff (37.7 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Added middleware hooks...this is better.
5100_file_upload_core_with_middleware_hooks_3.diff (41.8 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Apparently I don't know how not to lose files.
5100_file_upload_core_with_middleware_hooks_4.diff (38.9 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Apparently I don't know how not to lose files.
5113_file_upload_core_with_middleware_hooks.diff (39.1 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Fixes WSGI bug.
5116_streaming_upload_fixed_middleware_append.diff (35.4 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Fixed bug where .append(0, func) was called.
5116_streaming_upload_fixed_middleware_append_2.diff (42.4 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Fixed bug where .append(0, func) was called with the files this time.
5126_file_upload_wsgi_tested.diff (41.3 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Tested with WSGI and made a few changes.
5126_file_uploads_latest_patch.diff (39.4 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Added 'state':'starting' to be more like mod_uploadprogress.
5127_file_uploads_no_streaming_fixed.diff (39.2 KB ) - added by Chris Beaven 17 years ago.
There was an error uploading large files with streaming turned off
5313_updated_file_progress.diff (36.7 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Removed some unneeded things. No file progress tracking by default.
5343_1_streaming_file_upload.diff (38.4 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Newer, cleaner version of file upload script
5343_streaming_file_upload_no_assert.diff (39.3 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Newer, cleaner version of file upload script (doh! no random commented assert)
5343_streaming_file_upload_best.diff (38.4 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Sorry about that -- this one is the better one.
5343_cleaned_streaming_file_upload.diff (37.4 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
It's amazing what Trac helps you see.
5343_cleaned_streaming_file_upload_tweaked.diff (37.3 KB ) - added by Michael Axiak <axiak@…> 17 years ago.
Made a subtle fix after testing with #4165.
5343_cleaned_streaming_file_upload_tweaked2.diff (37.3 KB ) - added by klaus.blindert@… 17 years ago.
Fixed microscopic bug in an error path
5722.diff (26.4 KB ) - added by simonbun <simonbun@…> 17 years ago.
Updated patch against r5722
5722.2.diff (22.2 KB ) - added by simonbun <simonbun@…> 17 years ago.
Sorry, that last one patch also included #4165. This is the correct one.
5724_streaming_file_upload.diff (37.8 KB ) - added by Brian Rosner <brosner@…> 17 years ago.
updated to r5724 (previous patch was missing files too)
5820_streaming_file_upload.diff (37.8 KB ) - added by Brian Rosner <brosner@…> 17 years ago.
updated to r5820
6469_streaming_file_upload.diff (38.5 KB ) - added by Faheem Mitha <faheem@…> 16 years ago.
6525_all_tests_pass.diff (39.6 KB ) - added by Øyvind Saltvik <oyvind@…> 16 years ago.
Fixed unicode in POST issues, added django.http.utils, moved str_to_unicode there
6603_all_tests_pass_uploadedfile_wrapper_fixed.diff (42.1 KB ) - added by Øyvind Saltvik <oyvind@…> 16 years ago.
Fixed a problem with UploadedFile wrapper and making sure content is not read in Fie/ImageField
6652_isValidImage_using_tmpfilename.diff (42.7 KB ) - added by Øyvind Saltvik <oyvind@…> 16 years ago.
made the isValidImage validator try the tempfile before reading content
6654_fixed_tests_and_file_clean.diff (45.0 KB ) - added by Øyvind Saltvik <oyvind@…> 16 years ago.
diff did not apply cleanly, fixed
streaming.6710.patch (45.5 KB ) - added by Faheem Mitha <faheem@…> 16 years ago.
Updated streaming patch to trunk rev 6710.
streaming.7092.patch (44.8 KB ) - added by egon 16 years ago.
Made it apply cleanly to rev. 7092
streaming.7092.patch.partial_tests_fix (48.6 KB ) - added by Faheem Mitha 16 years ago.
Slightly modified version of streaming.7092.patch with more tests passing.
streaming.7106.patch (48.0 KB ) - added by egon 16 years ago.
Passes al tests.
ticket2070_rev7277.diff (48.1 KB ) - added by Michael Axiak 16 years ago.
Same patch applied to @7277
2070_revision7339_uploadhandling.diff (38.8 KB ) - added by Michael Axiak 16 years ago.
NEW Upload handling for revision 7339
2070_revision7339_formhandling.diff (11.4 KB ) - added by Michael Axiak 16 years ago.
NEW Form handling for uploaded files for revision 7339
2070_revision7351_latest.diff (67.7 KB ) - added by Michael Axiak 16 years ago.
Latest patch for 2070...includes everything (see comment)
2070_latest7354.diff (68.8 KB ) - added by Michael Axiak 16 years ago.
The latest #2070 patch. Small changes...see comment.
2070_revision7359.diff (76.9 KB ) - added by Michael Axiak 16 years ago.
New diff...some style changes and new documentation.
2070_revision7363.diff (93.2 KB ) - added by Michael Axiak 16 years ago.
Altered documentation to be more approachable.
2070_revision7363.2.diff (94.0 KB ) - added by Michael Axiak 16 years ago.
Slightly newer...handles exhaustion a little bit differently.
2070_revision7377.diff (95.3 KB ) - added by Michael Axiak 16 years ago.
New 2070 patch...includes fixes to the parser in addition to the newforms fix caught by EricB as well as better docs.
2070_revision7396.diff (98.6 KB ) - added by Michael Axiak 16 years ago.
Better patch...not sure how the newforms code in the previous patch was the way it was.
2070_revision7388.diff (98.6 KB ) - added by Michael Axiak 16 years ago.
new patch that fixes error with instantiating the upload handlers, thanks Eric!
2070_revision7484.diff (99.2 KB ) - added by Michael Axiak 16 years ago.
Updated diff per Jacob's requests and useful feedback.
2070_revision7484.2.diff (99.3 KB ) - added by Michael Axiak 16 years ago.
Fixed file_size bug with memory handling.
2070_revision7599.diff (45.6 KB ) - added by Leah Culver 16 years ago.
Updated patch to r7599
2070-r7695.patch (100.6 KB ) - added by Jacob 16 years ago.
Streaming uploads patch updated to [7695] and reviewed.
2070-r7728.patch (114.5 KB ) - added by Jacob 16 years ago.
"release candidate" patch

Change History (376)

comment:1 by oyvind.saltvik@…, 18 years ago

Severity: normalmajor

Should there be a setting to enable/disable this?

Needs testing on python 2.3.

comment:2 by James Bennett, 18 years ago

priority: highnormal
Resolution: duplicate
Severity: majornormal
Status: newclosed

This is a duplicate of #1484.

comment:3 by James Bennett, 18 years ago

Resolution: duplicate
Status: closedreopened

Or is it? Leaving it open until someone who knows more about it can sort this out.

comment:4 by [530], 18 years ago

Detailed description of patch.

Based of the #1448 patch that i experienced trouble with using edit inline and large files.

Changes from trunk:

  • Added STREAMING_UPLOADS=False to global_settings
  • Added parse_streaming_file_upload it http/__init__.py
  • Made changes to modpython and wsgi in handlers to use parse_streaming_file_upload
  • Subclassed cgi.FieldStorage
    • overriding of make_file to use NamedTemporaryFile
    • overriding of bouandry parsers to make readline read in 64k chunks to avoid reading extremely long lines into memory
  • Subclassed rfc822.Message
    • overriding of bouandry parsers to make readline read in 64k chunks to avoid reading extremely long lines into memory
  • Made FileField validate upload using tempfile size
  • Made save_file in db/models/fields/__init__.py pass temporary file name to _save_FIELD_file instead of file contents
  • Made _save_FIELD_file copy temporary file instead of write file from memory

comment:5 by [530], 18 years ago

Error in previous post.

Based of the #1448 patch that i experienced trouble with using edit inline and large files.

Should be.

Based of the #1484 patch that i experienced trouble with using edit inline and large files.

comment:6 by Matias Hermanrud Fjeld <mhf@…>, 18 years ago

This patch uses the rfc822 module. The Python docs says the following about rfc822:

  Deprecated since release 2.3. The email package should be used in preference to the rfc822 module.
  This module is present only to maintain backward compatibility.

comment:7 by Home, 18 years ago

Type: defect

comment:8 by [530], 18 years ago

Type: defect

The problem with the email package is that it reads the entire file into memory. Django supports python 2.3 anyway.

comment:9 by [530], 18 years ago

to use monitor from a view

def progress_view(request):
    
    import pickle

    sessionid = request.COOKIES.get('sessionid', None)
    progress = pickle.load(open('/tmp/'+sessionid, 'r'))

Accessible variables

progress.fullength - full request byte lenght
progress.at_status - current bytes read
progress.at_file - current file read
progress.file_status - current file bytes read


comment:10 by bnewbold@…, 18 years ago

Had a problem on FreeBSD with my /tmp being on a seperate filesystem: os.rename threw an "Invalid cross-device link" error.
The fix was to edit django/db/models/base.py, "import shutils" at the top and change the one instance of "os.rename" to "shutils.move". Works fine now... shutils came default with python2.4 I don't know about other releases, or if it works on other OSs

comment:11 by Adrian Holovaty, 18 years ago

In reference to bnewbold's comment above, it's the shutil module, and it looks like it's been around since at least Python 2.3. (Django supports Python 2.3 and up.)

comment:12 by [Cha0S], 18 years ago

to make it work on windows make the following changes in UploadStateKeeper class:

class UploadStateKeeper:

    def __init__(self, identifier, fullength=None):

        self.identifier = identifier

        if fullength:
            self.state = UploadState(self.identifier, fullength)

        else:
            import pickle
            self.state = pickle.load(self.get_status_path(), 'r')

    def get_status_path(self):
        from tempfile import gettempdir
        return os.path.join(gettempdir(), self.identifier)

    def set_status(self, filename, addlen):
        self.state.set_status(filename, addlen)
        import pickle
        pickle.dump(self.state,open(self.get_status_path(), 'w'))

    def get_state(self):
        return self.state

comment:13 by Jacob, 18 years ago

Can anyone tell me the differences between the latest patch on this ticket and the latest one on #1484? Are these two tickets/patches duplicates or complements?

comment:14 by bahamut@…, 18 years ago

Type: defectenhancement

This ticket is the continuation of #1484.

comment:15 by Jacob, 18 years ago

Sorry if I'm being dense, but does that mean that #1484 should be closed in favor of this one? I'm trying to figure out which patch(es) need to be applied here.

comment:16 by bahamut@…, 18 years ago

Keywords: timezone time zone UTC added

In my opinion, yes #1484 should be closed. The only patch you have to apply is the latest one in this ticket.

It may help to svn up -r 3358 before you apply the patch, or you can try at the current HEAD and see what goes.

comment:17 by Jacob, 18 years ago

Thanks.

comment:18 by Jacob, 18 years ago

Owner: changed from Adrian Holovaty to Jacob
Status: reopenednew

Right, I've finally had a chance to play with this patch. It's a big one, so I want to make sure I understand what's going on before I check it in. So apologies if I'm being dense, but I gotta understand this stuff:

  • I'm confused about why you needed to override Message.readline} just to change the argument to self.fp.readline() (django/http/init.py, line 105). Is the Python implementation broken in some way? What happens if you take that bit out?


  • Pretty much the same question for the FieldStorage subclass. This is a *lot* of duplicated code from the builtin cgi module, and it gives me pause. Can you explain why you need to do this (and perhaps think about if there's another way?)


  • The upload status info stored in /tmp is pretty awesome. However, it doesn't seem to work all the time for me; while uploading a big file if I keep cat'ing that /tmp file it's often just empty. Seems something keeps emptying it out? Also, using the session ID seems like it would cause issues if I tried to upload multiple files at the same time (a very real possibility if I'm uploading really big-ass things).


  • Related to that, setting os.envrion['SESSIONID'] (done in both the WSGI and the mod_python handler) seems very leaky. We've had bugs before with this type of env monkeybusiness... Could you use a threadlocal instead?


I should end, however, by saying that the behavior of this patch rocks. Being able to upload a 2 GB file without locking my server is super-nice, and the status stuff absolutely kicks ass (I think I already said that). I'm pretty sure I can clean up all the stuff I just bitched about, but I've got a pretty long todo.txt and if you can take care of these problems and/or answer my questions, I'll check this in PDQ.

comment:19 by Jacob, 18 years ago

Status: newassigned

comment:20 by [530], 18 years ago

  • If that bit is taken out memory usage soars because most browsers do not use newlines when sending files so it actually reads to EOF. Is it possible to override self.fp.readline() to call self.fp.readline(64000)? Then most of this code would be removed.
  • The upload status is a bit tricy, I need to override FieldStorage to store the status. But since FieldStorage is not a extended class of object there is no super(FieldStorage, self).read_lines_to_outerboundary() causing repetition of code.
  • Don't know how multiple uploads can be solved since it would require multiple id's, maybe use sessions instead if possible. Never experienced the emptiying if the temp file. Does this happen with both mod_python and wsgi?
  • About setting os.environ, the mod_python-handler does not set it, but wsgi does, need to fix that one instead of messing with threadlocal , this may be why the file is emptied if it only happens with wsgi.

comment:21 by anonymous, 18 years ago

Cc: Maniac@… added

comment:22 by anonymous, 18 years ago

Jacob, after reading your questions I'd like to share here my knowledge as an author of original patch in #1484.

  • About readline(N). Actually browsers do send newlines when sending binary files because newlines (byte 0x0A) are present in binary files also. In theory 0x0A in a binary body can be rare enough for readline to read very large chunks in memory. However in practice 0x0A are present often enough (I'm talking from the experience of two services working mainly with big jpegs, mp3 and zips). This is, btw, why my original patch in #1484 didn't include the boundary in readline. The downside of replacing readlines in FieldStorage's code with readline(N) is that the base class doesn't provide this for overriding so you have to actually rewrite one-to-one pretty large chunks of code.
  • To my liking storing upload info in files is not very nice. I mentioned my idea in django-developers about this that it's better be a hook for the user code (in middleware style) that is called from the handler accepting uploaded data. This is more flexible because a user can store this progress not in /tmp but wherever he likes and do more interesting stuff then just measuring progress. For example closing the upload if the file is too big for a server.
  • One more thing that we've already discussed with Oyvind in an old ticket that his patch alters both HTTP handlers and admin app. The old patch was only for HTTP and my point is that these things should be separate because it's much simpler to review and apply.

Hope my comment was useful!

comment:23 by anonymous, 18 years ago

One more nit... Oyvin what was the reason to use rfc822? It's a new thing in your variant of the old patch and I just don't get why it's needed :-)

comment:24 by [530], 18 years ago

  • When uploading two 500mb iso's, it broke down constantly with a memoryerror when not setting a buffer size, everything worked fine when a bufsize was added.
  • About upload storage I do agree about using a hook instead.
  • Rfc822 is still used by FieldStorage, now hidden from view. I just don not think of setting bufsize in make_file instead of passing it to readline (if that works). Have not had time to test yet.
  • Agree that reviewing is easier. But it is nice for easy testing via the admin to see why django is the right tool for the massive job. :)

comment:25 by anonymous, 18 years ago

Ignore my patch 3358-streaming-file_with_settings_and_progressmonitor.3.diff, adding bufsize does not affect readline. Also it seems chunked reads are required for storing the state. Don't see any easy way to make this less code yet. I'll add a new patch with just the WSGI fix.

comment:26 by [530], 18 years ago

I was planning to make a middleware for a progressmonitor hook, but not quite sure when middleware is applied and how to pass this hook from the middleware to FieldStorage. Any help to clarify/fix this would be great.

Also if somenoe could test this successfully using WSGI it would be great.

comment:27 by Maniac <Maniac@…>, 18 years ago

My idea of a hook is not fit into current middleware, I just wanted to retain the style. So to clarify I see it like this.

This can be a separate set of hooks similar to MIDDLEWARE_CLASSES or it can be done in middleware classes themselves. To do the latter we will specifiy that in addition to process_request, _response, _view and _exception methods a middleware can define a new kind of method: process_upload(request, data).

parse_file_upload then would search for these middleware methods just as http requests do in modpython.py and wsgi.py. The list of methods is passed to a FieldStorage decendant that will call them each time it's about to write a chunk of data to the target stream.

This is it... Sorry I can't come with the actual code: pretty busy rolling out my current project.

comment:28 by Todd O'Bryan, 18 years ago

I figured the last attachment was supposed to be a diff and tried to upload it with the right extension. Unfortunately, '.py' got added and now I don't know how to get rid of the attachment.

Sorry for the clutter.

comment:29 by [530], 18 years ago

Seems wsgi can't do streaming uploads this way, but ligthttpd 1.5 will chunk the uploads and has a mod_uploadprogress that serializes to JSON. Will try adapt this patch to serialize the same way so it wont matter if you use this streaming uploads patch or mod_uploadprogress.

comment:30 by Maniac@…, 18 years ago

Why do you say wsgi can't do this? It gives environwsgi.input to the handler that is a file-like object with uploading data. Handler then can read it in chunks (it actually does this in parse_file_upload).

comment:31 by [530], 18 years ago

WSGI may, but it seems like lighthttpd does not pass anything to wsgi before the request is done. I may be wrong, but I have no way of debugging properly.

comment:32 by [530], 18 years ago

Bug in last patch, did not manage to override readline to use a buffer size. If this cannot be done, code must be repeated.

comment:33 by [530], 18 years ago

Wsgi users using lighttpd should use mod_uploadprogress instead. There is some kind of delay before the upload state is updated using wsgi, the same does not happen on mod_python. Is this a issue with my code, the fastcgi/scgi spec or lighttpd's implementation? Can someone test this with fastcgi/scgi on apache?

comment:34 by [530], 18 years ago

GET identifer for upload state.

http://foo.com?bar saves json data in /tmp/bar

For javascript, making a /progress/ url and view see http://blog.lighttpd.net/articles/2006/08/01/mod_uploadprogress-is-back

Makes it easy to add a admin js option to a model to enable upload state in admin.

comment:35 by Todd O'Bryan, 18 years ago

I just provided a patch which steals liberally from Oyvind, but decouples the progress monitor from the rest of the file upload behavior.

In a nutshell, if you don't want to read uploads into memory, you set request.save_file_class to the class of the kind of file-like object you'd like uploads routed to. A sample middleware class is provided which uses a temp file.

I haven't looked at Oyvind's latest code, but my guess is that we could easily move the upload monitor from the guts of Django by providing a file-like object class that keeps track of how much stuff it has received from the POST data stream.

Comments welcome. (But be nice; it's my first patch.)

comment:36 by [530], 18 years ago

To toddobryan, I made some fatal mistakes before 3358-streaming-file_with_settings_and_progressmonitor_fixed.diff + the wsgi fix, see this patch, you may have fixed it but i'm not sure. the latest patch I would say is really important for multiple uploads. There are some more upload progress issues, the json encoder gives two value sets but i can't find out why. Posting my latest diff now.

Btw I like one more developer helping me.

comment:37 by Todd O'Bryan, 18 years ago

OK, you don't have to be very nice. It turns out that I missed perhaps the most important thing, which is wrapping the stream from the POST in some kind of object which prevents lines from being too long and filling up memory.

Oyvind's PostStream seems like the kind of thing to do, but I'd suggest wrapping the stream before we pass it into the FieldStorage object, not inside. I was misled by the fact that the read_binary() method inside there has a fixed 8k buffer.

comment:38 by [530], 18 years ago

Toddobryan I agree. Making it more modular is great.

comment:39 by Todd O'Bryan, 18 years ago

If you could concentrate on getting your progress monitor to work from a file-like object instead of FieldStorage, I'll integrate the PostStream idea into my code so that the readline()s in the FieldStorage class don't read too much.

comment:40 by [530], 18 years ago

Thinking of making the upload monitor as a hook in streaming_upload_middleware.

Then make a upload monitor middleware that hooks onto streaming_upload_middleware by loading before streaming_upload_middleware.

Upload monitor will act exacly like mod_uploadprogress so wsgi users just load streaming_upload_middleware.

comment:41 by Malcolm Tredinnick, 18 years ago

By the way, when you guys think you've addressed all the problems and simplified the code as much as you can, etc, just add a comment pointing to the right patch that needs another review (and mention that you think it's "done").

comment:42 by [530], 18 years ago

To test 'svn up -r 3518', apply patch and add:

django.middleware.upload.UploadStateMiddleware and django.middleware.upload.StreamingUploadMiddleware to MIDDLEWARE_CLASSES.

Uploadstate test currently builtin to admin change form. 'tail /tmp/test123' for upload state as json.

comment:43 by anonymous, 18 years ago

Regular uploads seem to fail if the middlewares are disabled. Backtrace:

Traceback (most recent call last):
  File "/opt/local/lib/python2.4/site-packages/django/core/servers/basehttp.py", line 272, in run
    self.result = application(self.environ, self.start_response)
  File "/opt/local/lib/python2.4/site-packages/django/core/servers/basehttp.py", line 611, in __call__
    return self.application(environ, start_response)
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 158, in __call__
    response = self.get_response(request.path, request)
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/base.py", line 102, in get_response
    return self.get_technical_error_response(request)
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/base.py", line 134, in get_technical_error_response
    return debug.technical_500_response(request, *sys.exc_info())
  File "/opt/local/lib/python2.4/site-packages/django/views/debug.py", line 131, in technical_500_response
    return HttpResponseServerError(t.render(c), mimetype='text/html')
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 155, in render
    return self.nodelist.render(context)
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 688, in render
    bits.append(self.render_node(node, context))
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 716, in render_node
    raise wrapped
TemplateSyntaxError: Caught an exception while rendering: 'NoneType' object has no attribute 'items'

Original Traceback (most recent call last):
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 706, in render_node
    result = node.render(context)
  File "/opt/local/lib/python2.4/site-packages/django/template/defaulttags.py", line 189, in render
    value = bool_expr.resolve(context, True)
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 548, in resolve
    obj = resolve_variable(self.var, context)
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 657, in resolve_variable
    raise VariableDoesNotExist, "Failed lookup for key [%s] in %r" % (bits[0], current) # missing attribute
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 61, in __repr__
    return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 97, in _get_post
    self._load_post_and_files()
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 75, in _load_post_and_files
    self._post, self._files = http.parse_file_upload(self)
  File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 83, in parse_file_upload
    return default_parse_file_upload(req)
  File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 50, in default_parse_file_upload
    raw_message = '\r\n'.join(['%s:%s' % pair for pair in req.header_dict.items()])
AttributeError: 'NoneType' object has no attribute 'items'

comment:44 by Djordjevic Nebojsa <nesh at studioquattro co yu>, 18 years ago

Cc: nesh@… added

comment:45 by wiz, 18 years ago

Cc: aenor.realm@… added

comment:46 by bahamut@…, 18 years ago

This is anecdotal evidence at best, but  rapid testing of this patch seems to indicate it does not work with the development server.

I basically applied it, updated to head, added the 2 middlewares, tried one of my upload forms, and it just sat there forever, never reaching the Django view code.

comment:47 by [530], 18 years ago

I would not expect this patch to work with the dev server, since it is a single thread you cannot do multiple requests at the same time.

comment:48 by anonymous, 18 years ago

I may be mistaken about my last comment, but the dev server uses the wsgi handler. So it may be the delay problem, or that stdin is not passed before the request is done. So since wsgi using lighttpd works, the dev server probably handle things differently. Don't really know alot about the dev server.

comment:49 by anonymous, 18 years ago

Cc: gary.wilson@… added

comment:50 by hruske, 18 years ago

This doesn't work for me at all. With Opera 9 and Firefox 1.5, I get KeyError on X-Progress-Id, which is supposed to be HTTP_X_PROGRESS_ID? (Webserver is Lighttpd, distro Debian unstable)

Also, isn't this line a bit insecure, given the get_temp_file does an "os.path.join":

content = open(get_temp_file(request.header_dict['X-Progress-Id']), 'r').read()

I've replaced my code to use cache framework instead. But mind you need server-wide cache, per process will probably not be enough.

And then, you probably don't want to open a new TCP connection every time you make a xml request, since this will effectively DOS your server.

I was also having problems with JS - the browser did not want to do a POST at all. It started polling server for progress, but it was not posting at all.

comment:51 by [530], 18 years ago

Lighttpd has a delay before it saves data and progress info.

Supposed to be HTTP_X_PROGRESS_ID?

I did it this way to be compatible with mod_uploadprogress that you should use instead.

Is there another way than opening another tcp connection using xmlhttp?

comment:52 by hruske, 18 years ago

How much should that delay usually be? Since there is no POST for Django, progress bar does not work. This probably means I need to use mod_uploadprogress to get it working, but the new mod_uploadprogress has not been released yet officially.

Yes, HTTP_X_PROGRESS_ID. When I printed contents of request.header_dict I could see a key named HTTP_X_PROGRESS_ID which corresponds to X-Progress-Id.

Sure, one could open a TCP connection and server periodically sends info and client reads from it. But I see that you aim for mod_uploadprogress compatibility.

comment:53 by hruske, 18 years ago

Also, it's X-Progress-ID in lighttpd, with capital ID.

by [530], 18 years ago

Now using X_PROGRESS_ID instead of X-Progress-Id, accepts any lower/uppercase/ - / _ /prefix variant

comment:54 by bahamut@…, 18 years ago

Keywords: streaming upload large added; timezone time zone UTC removed

The patch does not work with Apache 2 + mod_fcgid.

I'm running on Mac OS X 10.4.7 with a pure MacPorts installation (save Django and flup -- both are a svn HEAD).

Apache 2 configuration:

{{{<IfModule mod_fcgid.c>

AddHandler fcgid-script .fcgi


IdleTimeout 600
ProcessLifeTime 3600
MaxProcessCount 8
DefaultMinClassProcessCount 3
DefaultMaxClassProcessCount 3
IPCConnectTimeout 8
IPCCommTimeout 48

</IfModule>

ScriptAlias /myapp "cgi-bin/myapp.fcgi/"}}}

myapp.fcgi:

{{{#!/opt/local/bin/python
# -*- coding: utf-8 -*-
import sys, os

# Add a custom Python path.
sys.path.insert(0, "/opt/local/www")

# Switch to the directory of your project. (Optional.)
os.chdir("/opt/local/www/myapp")

# Set the DJANGO_SETTINGS_MODULE environment variable.
os.environDJANGO_SETTINGS_MODULE = "myapp.settings"

from django.core.servers.fastcgi import runfastcgi
runfastcgi(["method=threaded", "daemonize=false"])}}}

comment:55 by anonymous, 18 years ago

Repost with proper formatting :/

Apache 2 configuration:

<IfModule mod_fcgid.c>
    AddHandler fcgid-script .fcgi
    
    IdleTimeout 600
    ProcessLifeTime 3600
    MaxProcessCount 8
    DefaultMinClassProcessCount 3
    DefaultMaxClassProcessCount 3
    IPCConnectTimeout 8
    IPCCommTimeout 48
</IfModule>

ScriptAlias /myapp "cgi-bin/myapp.fcgi/"

myapp.fcgi:

#!/opt/local/bin/python
# -*- coding: utf-8 -*-
import sys, os

# Add a custom Python path.
sys.path.insert(0, "/opt/local/www")

# Switch to the directory of your project. (Optional.)
os.chdir("/opt/local/www/myapp")

# Set the DJANGO_SETTINGS_MODULE environment variable.
os.environ['DJANGO_SETTINGS_MODULE'] = "myapp.settings"

from django.core.servers.fastcgi import runfastcgi
runfastcgi(["method=threaded", "daemonize=false"])

comment:56 by bahamut@…, 18 years ago

Cc: bahamut@… added

comment:57 by bahamut@…, 18 years ago

I get the following error, again with my mod_fcgid setup, when using the admin interface with the progress status monitor js.

Unhandled exception in thread started by <bound method ThreadPool._worker of <flup.server.threadpool.ThreadPool object at 0x715270>>Traceback (most recent call last):  File "/opt/local/www/data/flup/server/threadpool.py", line 103, in _worker    job.run()  File "/opt/local/www/data/flup/server/fcgi_base.py", line 642, in run    self.process_input()  File "/opt/local/www/data/flup/server/fcgi_base.py", line 678, in process_input    self._do_params(rec)  File "/opt/local/www/data/flup/server/fcgi_base.py", line 777, in _do_params    self._start_request(req)  File "/opt/local/www/data/flup/server/fcgi_base.py", line 761, in _start_request    req.run()  File "/opt/local/www/data/flup/server/fcgi_base.py", line 563, in run    self.server.error(self)  File "/opt/local/www/data/flup/server/fcgi_base.py", line 1157, in error    req.stdout.write('Content-Type: text/html\r\n\r\n' +  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/cgitb.py", line 158, in html    dump.append('%s&nbsp;= %s' % (name, pydoc.html.repr(value)))  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 360, in repr    return Repr.repr(self, object)  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/repr.py", line 24, in repr    return self.repr1(x, self.maxlevel)  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 367, in repr1    return self.escape(cram(stripid(repr(x)), self.maxother))  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 65, in __repr__    return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 101, in _get_post    self._load_post_and_files()  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 79, in _load_post_and_files    self._post, self._files = http.parse_file_upload(self)  File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 82, in parse_file_upload    return req.parse_file_upload(req)  File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 83, in parse_streaming_file_upload    upload_state = req.upload_state(req)  File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 126, in __init__    self.state = {'size': int(req.header_dict.get('content-length')),TypeError: int() argument must be a string or a number

comment:58 by bahamut@…, 18 years ago

Something ate line breaks, apparently.

Unhandled exception in thread started by <bound method ThreadPool._worker of <flup.server.threadpool.ThreadPool object at 0x715270>>
Traceback (most recent call last):
  File "/opt/local/www/data/flup/server/threadpool.py", line 103, in _worker
    job.run()
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 642, in run
    self.process_input()
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 678, in process_input
    self._do_params(rec)
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 777, in _do_params
    self._start_request(req)
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 761, in _start_request
    req.run()
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 563, in run
    self.server.error(self)
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 1157, in error
    req.stdout.write('Content-Type: text/html\r\n\r\n' +
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/cgitb.py", line 158, in html
    dump.append('%s&nbsp;= %s' % (name, pydoc.html.repr(value)))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 360, in repr
    return Repr.repr(self, object)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/repr.py", line 24, in repr
    return self.repr1(x, self.maxlevel)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 367, in repr1
    return self.escape(cram(stripid(repr(x)), self.maxother))
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 65, in __repr__
    return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 101, in _get_post
    self._load_post_and_files()
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 79, in _load_post_and_files
    self._post, self._files = http.parse_file_upload(self)
  File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 82, in parse_file_upload
    return req.parse_file_upload(req)
  File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 83, in parse_streaming_file_upload
    upload_state = req.upload_state(req)
  File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 126, in __init__
    self.state = {'size': int(req.header_dict.get('content-length')),
TypeError: int() argument must be a string or a number

comment:59 by bahamut@…, 18 years ago

I have just tested the patch with Apache 2 and mod_python. It works correctly, however I had to make 2 small modifications because my Django application isn't running at the root context of my server.

Basically, the admin JavaScript code hardcodes a request to "/progress/". This needs to be changed appropriately for each install, e.g. in my case "/django/progress/".

Similarly, the UploadStateMiddleware checks that the request path is exactly "/progress/", which needs to be changed to match.

comment:60 by anonymous, 18 years ago

I svn up'ed, applied 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id.diff, deleted the Django directory under site-packages, reinstalled Django with 'python setup.py install' and restarted apache/modpython but Django appears to be behaving as before. My memory usage still climbs beyond what I'd expect and I don't see any files in /tmp like I'd expect from tempfile.NamedTemporaryFile().

I added these lines to my settings.py file:

STREAMING_UPLOADS=True

UPLOAD_BUFFER_SIZE=2048

Am I missing something?

Thanks!

comment:61 by oyvind@…, 18 years ago

It is 2 middleware now.

django.middleware.upload.UploadStateMiddleware and django.middleware.upload.StreamingUploadMiddleware

Order is important

comment:62 by anonymous, 18 years ago

Thanks for the info!

I changed middleware_classes to look like this:

MIDDLEWARE_CLASSES = (
    'django.middleware.common.CommonMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.middleware.doc.XViewMiddleware',
    'django.middleware.upload.UploadStateMiddleware',
    'django.middleware.upload.StreamingUploadMiddleware',
)

I was able to upload a couple large test files and tempfiles were showing up as expected in /tmp, but I'm intermittently receiving this exception when uploading:

Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/base.py" in get_response
  74. response = callback(request, *callback_args, **callback_kwargs)
File "/home/misjxm/dp1/file/app/views.py" in createmessage
  275. if request.POST:
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _get_post
  51. self._load_post_and_files()
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _load_post_and_files
  32. self._post, self._files = http.parse_file_upload(self)
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/http/__init__.py" in parse_file_upload
  81. return req.parse_file_upload(req)
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/middleware/upload.py" in parse_streaming_file_upload
  104. FILES.appendlist(key, FileDict({
File "/usr/lib/python2.4/cgi.py" in __getattr__
  540. raise AttributeError, name

  AttributeError at /file/compose/
  tmp_name

Have you seen this before?

Thanks!

comment:63 by anonymous, 18 years ago

And I only seem to get that exception when uploading text files such as M3U playlists, bashrc files, etc.

comment:64 by anonymous, 18 years ago

Text files in general only cause the problem intermittently. I have one M3U file that consistantly causes the exception to be thrown. This is what local vars looks like when I upload that file:

name      'tmp_name'
self      FieldStorage('file1', 'klipschtest.m3u', "MP3\\2pac\\Greatest Hits (Disc 1)\\1-05 Hail Mary.mp3\r\nFLAC\\James Blunt\\Back to Bedlam\\03-Wisemen.flac\r\nMP3\\Jimi_Hendrix\\UNKNOWN\\Little wing (extended instrumental).mp3\r\nFLAC\\Led Zeppelin\\Remasters (disc II)\\04-No Quarter.flac\r\nFLAC\\Pink Floyd\\The Wall - Disc 1\\03-Another Brick in the Wall. Part 1.flac\r\nFLAC\\Pink Floyd\\The Wall - Disc 1\\04-The Happiest Days of Our Lives.flac\r\nFLAC\\Natalie Merchant\\Live In Concert\\05-Carnival.flac\r\nMP3\\Metallica\\...And Justice For All\\08-To Live Is To Die.mp3\r\nMP3\\Red Hot Chili Peppers\\Blood Sugar Sex Magik\\10-Blood Sugar Sex Magik.mp3\r\nFLAC\\Sade\\Lovers Rock\\03-King Of Sorrow.flac\r\nFLAC\\Stone, Joss\\The Soul Sessions\\05-Dirty Man.flac\r\nFLAC\\U.G.K\\Ridin' Dirty\\03-Murder.flac\r\nMP3\\Weezer\\Weezer (Green Album)\\04-Weezer _ Island In The Sun.mp3\r\nMP3\\Weezer\\Weezer (Green Album)\\03-Weezer _ Hash Pipe.mp3\r\n")

comment:65 by anonymous, 18 years ago

I'm getting an error about 'read only accepts 1 arguement (2 given)", from the following line in cgi.py:

self.fp.read(self.length)

comment:66 by anonymous, 18 years ago

This is the error I'm getting:

C:\web\fxedit>python manage.py runserver
Validating models...
0 errors found.

Django version 0.95, using settings 'fxedit.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CTRL-BREAK.
[16/Sep/2006 02:55:31] "GET /file/ HTTP/1.1" 200 1188
[16/Sep/2006 02:55:31] "GET /media/stylesheets/style.css HTTP/1.1" 304 0
[16/Sep/2006 02:55:31] "GET /media/stylesheets/print.css HTTP/1.1" 304 0
Traceback (most recent call last):

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\servers\

basehttp.py", line 272, in run

self.result = application(self.environ, self.start_response)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\servers\

basehttp.py", line 615, in call

return self.application(environ, start_response)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\wsgi.py", line 156, in call

response = self.get_response(request.path, request)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\base.py", line 102, in get_response

return self.get_technical_error_response(request)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\base.py", line 134, in get_technical_error_response

return debug.technical_500_response(request, *sys.exc_info())

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\views\debug.p

y", line 131, in technical_500_response

return HttpResponseServerError(t.render(c), mimetype='text/html')

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 155, in render

return self.nodelist.render(context)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 688, in render

bits.append(self.render_node(node, context))

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 716, in render_node

raise wrapped

TemplateSyntaxError: Caught an exception while rendering: read() takes exactly 1

argument (2 given)

Original Traceback (most recent call last):

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 706, in render_node

result = node.render(context)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\defa

ulttags.py", line 189, in render

value = bool_expr.resolve(context, True)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 548, in resolve

obj = resolve_variable(self.var, context)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 657, in resolve_variable

raise VariableDoesNotExist, "Failed lookup for key [%s] in %r" % (bits[0], c

urrent) # missing attribute

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\wsgi.py", line 61, in repr

return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\wsgi.py", line 97, in _get_post

self._load_post_and_files()

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\wsgi.py", line 75, in _load_post_and_files

self._post, self._files = http.parse_file_upload(self)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\http\init

.py", line 80, in parse_file_upload

return req.parse_file_upload(req)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\middleware\up

load.py", line 87, in parse_streaming_file_upload

fs = FieldStorage(req.raw_request, environ=req.META, headers=req.header_dict

, upload_state=upload_state )

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\middleware\up

load.py", line 71, in init

keep_blank_values=keep_blank_values, strict_parsing=strict_parsing)

File "C:\Python24\lib\cgi.py", line 524, in init

self.read_urlencoded()

File "C:\Python24\lib\cgi.py", line 629, in read_urlencoded

qs = self.fp.read(self.length)

TypeError: read() takes exactly 1 argument (2 given)

[16/Sep/2006 02:55:53] "POST /file/stats/ HTTP/1.1" 500 3721

by [Cha0S], 18 years ago

windows temporary file locking fix

comment:67 by Chris Beaven, 18 years ago

I'm getting the same error as anonymous above (and am also using Windows).

comment:68 by oyvind@…, 17 years ago

Idea for fixing the m3u upload problem. May need a copy of this util.py to make it work with wsgi.

But I have almost given up on wsgi for this patch, uless anyone are able to shed som light on why wsgi does not like chunked reading from wsgi.input.

http://svn.apache.org/viewvc/httpd/mod_python/trunk/lib/python/mod_python/util.py?view=markup&pathrev=451861

comment:69 by [Cha0S], 17 years ago

There is a problem using this patch with Nginx (and maybe some other webservers). Nginx doesn't pass content-type and content-length headers and the result of this is that FieldStorage can't parse request correctly and nothing is uploaded. A workaround I've found is modification of cgi.FieldStorage to use HTTP_CONTENT_TYPE and HTTP_CONTENT_LENGTH. But I don't think this is elegant solution of this problem. Does anyone have other suggestions on how to solve it ?

comment:70 by Sung-Jin Hong <serialx.net@…>, 17 years ago

Worked for me(Apache mod_python). Working like charm. :) But heres some instructions..

  1. Do the patch (cd django; patch -Np0 < file.diff)
  2. Add two middlewares, modify the model like below:
    class FileList(models.Model):
        name = models.CharField(maxlength=255)
        email = models.EmailField()
    
        class Admin:
            js = ['js/UploadProgress.js']
    
    class AFile(models.Model):
        descr = models.CharField(maxlength=255,core=True)
        file = models.FileField(upload_to='files')
        inlist = models.ForeignKey(FileList,edit_inline=models.STACKED)
    
  1. You have progressbar in admin. :)

Submitting patch to the latest trunk.

comment:71 by Sung-Jin Hong <serialx.net@…>, 17 years ago

Cc: serialx.net@… added

comment:72 by [Cha0S], 17 years ago

There are some issues with this patch. I have it working on my local Linux and Windows boxes. But i can't get to work it completely correct on my server with the same setup as my local linux box (python 2.5, latest Django).
The issue is that when uploading a file WSGIRequest . _load_post_and_files is called only after all data is uploaded. So no streaming occurs and no progress is displayed and I didn't find a solution for this yet..

comment:73 by Øyvind Saltvik <oyvind.saltvik@…>, 17 years ago

[Cha0S] using lighttpd or apache, same lighttpd/apache version on both servers?
I found out lighttpd pre 1.4.0 supported streaming, so that has to be fixed before this patch will work on lighttpd.

comment:74 by anonymous, 17 years ago

This worked great on my system [Apache/2.0.58 (Win32) mod_python/3.2.8 Python/2.4.3]

Following up on Sung-Jin Hong's good instructions with a few tips for newbies like me:

  1. When you apply the patch, make sure you get django/middleware/upload.py. I found that when I did an SVN update to Rev 4065 and applied 4065-streaming_uploads_and_uploadprogress_middleware_x_progress_id_windowsfix.diff I did not get the file. I had to apply 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id_windowsfix.diff to get it.
  1. Make sure js/UploadProgress.js in accessible. The Admin page's path to it will be in the media directory, e.g. http://yoursite.com/media/js/UploadProgress.js. I had to copy UploadProgress.js from django/contrib/admin/media/js to my local media/js directory.
  1. Make sure the Apache httpd.conf file section for your site allows paths starting with "/progress", i.e. has a line like '<Location ~ "/(admin|shop|progress)">' in it, because the JS code is doing a "req.open("GET", "/progress/", 1)" repeatedly to track progress.

I find that the Upload progress box displays fine in Firefox and Opera, but it is at the very bottom left of the absolute page in IE6 (i.e. you need to scroll to the very bottom to see it). I'm still working on this problem.

Thank you to all who contributed this!

comment:75 by [Cha0S], 17 years ago

[Cha0S] using lighttpd or apache, same lighttpd/apache version on both servers? I found out lighttpd pre 1.4.0 supported streaming, so that has to be fixed before this patch will work on lighttpd.

I've tested with lighttpd and nginx web-servers. same versions on both servers. with apache it works fine on windows.

comment:76 by jacobm3 A T gmail.com, 17 years ago

Component: Core frameworkContrib apps
Type: enhancementdefect

Has anyone come up with a fix for the AttributeError exceptions when uploading text files? I'm still getting exceptions like this:

AttributeError at /file/compose/
tmp_name
Request Method: POST
Request URL: https://myserver/file/compose/
Exception Type: AttributeError
Exception Value: tmp_name
Exception Location: /usr/lib/python2.4/cgi.py in getattr, line 540

Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/base.py" in get_response

  1. response = callback(request, *callback_args, callback_kwargs)

File "/files/lib/file/app/views.py" in createmessage

  1. if request.POST:

File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _get_post

  1. self._load_post_and_files()

File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _load_post_and_files

  1. self._post, self._files = http.parse_file_upload(self)

File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/http/init.py" in parse_file_upload

  1. return req.parse_file_upload(req)

File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/middleware/upload.py" in parse_streaming_file_upload

  1. FILES.appendlist(key, FileDict({

File "/usr/lib/python2.4/cgi.py" in getattr

  1. raise AttributeError, name

AttributeError at /file/compose/
tmp_name

comment:77 by anonymous, 17 years ago

Type: defectenhancement

oops, didn't mean to change the type on this ticket... changing back to enhancement.

comment:78 by anonymous, 17 years ago

Component: Contrib appsCore framework

comment:79 by wonlay@…, 17 years ago

Cc: wonlay@… added

Hi, jacob
The 'AttributeError' occurred, because FieldStorge does not make_temp_file for small files, and the tmp_name is not defined.
The progress middleware has problem too. You will notice that your upload speed will down to a significant lower rate, if you use this middleware. The performance bottleneck is that, every single 'read' from request reads only a few bytes, even if you set a large buffer_size, and the serialization and save to disc of 'UploadState' is significant.

The attachments is my working copy of code.

comment:80 by anonymous, 17 years ago

Cc: lurker86@… added

comment:81 by Simon G. <dev@…>, 17 years ago

Needs documentation: set
Patch needs improvement: set
Triage Stage: UnreviewedAccepted

Can someone who has used this give us a progress report on this - which of the 3000 patches above are needed? can these be merged into one patch that works?

comment:82 by anonymous, 17 years ago

The one that i can vouch for is 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id.diff

But it has problems with m3u uploads's for some reason i can't figure out.

Was thinking about the possibility of using email.FeedParser instead of rfc822 but that is not available in python 2.3

This patch does not work with lighttpd, since lighttpd can't stream requests.

comment:83 by Ivan Sagalaev <Maniac@…>, 17 years ago

This patch does not work with lighttpd, since lighttpd can't stream requests.

This is not true. When Django works with Lighttpd over FastCGI lighttpd immediately offloads received data to Django process and it can stream it wherever it wants to. In fact I use one of my services that uploads zipped mp3 albums under lighttpd and streaming works (with my patch in #1484).

comment:84 by Øyvind Saltvik <oyvind.saltvik@…>, 17 years ago

I think you are correct. I guess the problem with this patch and fcgi has to do with not doing blocking reads with a buffer size. But a buffer size is needed if the browser sends a big file without eol's, and for the upload progress.

comment:85 by Ivan Sagalaev <Maniac@…>, 17 years ago

I'm not sure I understand what you're saying... I was replying solely to a claim that lighttpd can't do streaming. It certainly does streaming to a FCGI app. Whatever the app does then (streaming by blocks, by eols or no streaming at all) is a whole another question.

comment:86 by shaunc <shaun@…>, 17 years ago

Cc: shaun@… added

comment:87 by Øyvind Saltvik <oyvind.saltvik@…>, 17 years ago

Making a attempt at using email.FeedParser

http://fivethreeo.dynalias.org/repos/streaming
login: public, password: public

Trac http://fivethreeo.dynalias.org/trac/browser/streaming

comment:88 by Øyvind Saltvik <oyvind@…>, 17 years ago

Need to override FeedParser._parsegen to write the payload directly to a file instead of a in-memory string.

Probably involves copying FeedParser._parsegen and do some modificatioms.

comment:89 by Øyvind Saltvik <oyvind.saltvik@…>, 17 years ago

I am pleased to announce, its alive!!

Just some minor work left ( validators, refactoring, bugfixing ) volunteers?

FeedParser should work with python2.3 if it included with django.

comment:90 by anonymous, 17 years ago

Cc: bahamut@… removed

comment:91 by Øyvind Saltvik <oyvind.saltvik@…>, 17 years ago

To get the most recent diff, now with validator

svn co http://fivethreeo.dynalias.org/repos/streaming

then

cd streaming

svn diff -r371

by Øyvind Saltvik <oyvind.saltvik@…>, 17 years ago

did not work without middleware, fixed

comment:92 by Øyvind Saltvik <oyvind.saltvik@…>, 17 years ago

Anyone able to help out on fastcgi testing?

comment:93 by anonymous, 17 years ago

Cc: antonio.mele@… added

will this be added to django 1.0?

comment:94 by Øyvind Saltvik <oyvind.saltvik@…>, 17 years ago

Tested on lighttpd/fastcgi without errors or memory hogging.

Should feedparser be included with django so python2.3 users can use this feature too?

The way the feedparser works there is no way to not duplicate the entire _parsegen function as I see it.

comment:95 by anonymous, 17 years ago

Cc: herbert.poul@… added

comment:96 by Joakim Sernbrant <serbaut@…>, 17 years ago

Streaming file uploads, take #3

I would really like to see something like this committed to
trunk. However I think the previous patches has been too complicated
and tried to do too much. Lack of documentation and tests has probably
been a factor as well.

I have rewritten the file upload to make it simpler and added some
documentation and tests.

  • Simplified parsing by using a specialized RFC 2388 parser. The

previous patches used different RFC 2822 parsers which were not suited
for the task. The new parser is very efficient both cpu and memory
wise. It uses mx.TextTools for boudary searches if available.

  • Removed AJAX code and progress meter. While I think it is a good

thing I think it hinders the commitment of this patch. We should open
a new ticket for a progress meter in the Admin interface. The
X-Progress-ID header is still supported and a pickled progress file is
written if supplied.

  • Added error handling.
  • Added Documentation.
  • Added tests.

Tested with mod_python and lighttpd.

You need to specify the FILE_UPLOAD_DIR setting to enable streaming (read forms doc).

Patch attached (4459-streaming-file-upload.diff)

by Joakim Sernbrant <serbaut@…>, 17 years ago

Simplified streaming uploads

comment:97 by Øyvind Saltvik <oyvind.saltvik@…>, 17 years ago

Nice work. Agree with making ajax progressmeter a own ticket.

comment:98 by Ivan Sagalaev <Maniac@…>, 17 years ago

A couple of comments from quick reading of 4459-streaming-file-upload.diff

  • It streams uploads to files only if FILE_UPLOAD_DIR is set. But it looks that it stores files in temp directory anyway. What's the point of FILE_UPLOAD_DIR then?
  • Instead of FILE_UPLOAD_DIR it would be nice to have a setting like MAX_FILE_UPLOAD_BUFFER with the maximum number of bytes stored in memory during an upload. And only if the file is bigger it is streamed to disc. This will have a nice effect of not touching disc when dealing with small files which is faster
  • It looks like any Exception during the parsing is swallowed and returned as a dict of errors that is then stored in a POST data. This actually breaks the whole exceptions paradigm :-).
  • All the save_FIELD_file now accept raw_field. This is backward-incompatible but worse it limits this method to work only with uploaded files while now save_FIELD_file can accept any content not matter how it was created in a program. I understand that just renaming streamed upload is a fast feature but I believe it deserves a new method. Like 'save_FIELD_from_file' or something...

in reply to:  98 ; comment:99 by Joakim Sernbrant <serbaut@…>, 17 years ago

Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:

A couple of comments from quick reading of 4459-streaming-file-upload.diff

  • It streams uploads to files only if FILE_UPLOAD_DIR is set. But it looks that it stores files in temp directory anyway. What's the point of FILE_UPLOAD_DIR then?

A StringIO() buffer is used in that case so no files are used.

  • Instead of FILE_UPLOAD_DIR it would be nice to have a setting like MAX_FILE_UPLOAD_BUFFER with the maximum number of bytes stored in memory during an upload. And only if the file is bigger it is streamed to disc. This will have a nice effect of not touching disc when dealing with small files which is faster

The problem is that you do not know in advance how large a file is. However you know how large the POST is in total so you could enable file streaming based on that. I did not want to create too many settings variables but I agree that it might be a good idea. Another option would be to have a setting where you name the form fields what you want to have streamed.

  • It looks like any Exception during the parsing is swallowed and returned as a dict of errors that is then stored in a POST data. This actually breaks the whole exceptions paradigm :-).

Yes, I did that to avoid too many changes in wsgi.py/modpython.py since you have to take care not to try to read the input stream again in case of an exception. Meaning that I always wanted to return something. Bad choice maybe but I tried to make as few changes as possible so that the powers that be could start looking at the patch without too much confusion ;)

  • All the save_FIELD_file now accept raw_field. This is backward-incompatible but worse it limits this method to work only with uploaded files while now save_FIELD_file can accept any content not matter how it was created in a program. I understand that just renaming streamed upload is a fast feature but I believe it deserves a new method. Like 'save_FIELD_from_file' or something...

Aren't those methods invoked automatically for you when you do .save()? I didnt think they were public but I dont understand exacly how they work so I take your word for it...

by Joakim Sernbrant <serbaut@…>, 17 years ago

Added FILE_UPLOAD_MIN_SIZE (default 100kb) to define minimum request size for streaming to disk. Propage exeptions. I'm not too happy with the names of the settings anymore :/

comment:100 by Adrian Holovaty, 17 years ago

(I've removed many attachments, as per a request by oyvind.saltvik@… on django-developers.)

in reply to:  99 comment:101 by Ivan Sagalaev <Maniac@…>, 17 years ago

The problem is that you do not know in advance how large a file is. However you know how large the POST is in total so you could enable file streaming based on that.

Yeah that what I meant... The whole point is to not make Django eat everything in memory so a settings topping this amount seems most logical to me.

I saw you made this setting. Here are couple of nits:

file_upload_min_size = getattr(settings, 'FILE_UPLOAD_MIN_SIZE', 100000)

  1. There is a convention in Django to put a setting with its default value into django/conf/global_settings.py. This keeps all defaults in one place and ensures that settings are always present so you don't have to fail-safe them with getattr, just use settings.SETTING_NAME
  1. FILE_UPLOAD_MIN_SIZE sounds confusing, like we don't allow users to upload small files :-). I still like an idea of calling it a buffer size somehow (and actually use it as a buffer size when reading chunks from input). What do you (and everyone) think?
  1. 100 KB is more like 100 * 1024. But I'd make it about 512 * 1024 or 1024 * 1024. This is small enough to not even show on memory stats but big enough to handle most profile photos uploaded in practice without touching disk most of the times.

Yes, I did that to avoid too many changes in wsgi.py/modpython.py since you have to take care not to try to read the input stream again in case of an exception.

Well... An exception ensures just this: the program won't run normally in case of errors, return values on the other hand are easy to forget to check. I see that you have removed swallowing the Exception. Why not MultiPartParserError then? As I understand this will happen only when the input is really malformed (like not created by a browser) so it's very little a user application can do about it. And I think it shouldn't. I think we should catch this exception in handlers/base.py and unconditionally return '400 Bad Request' on it.

Aren't those methods invoked automatically for you when you do .save()? I didnt think they were public but I dont understand exacly how they work so I take your word for it...

They are called from save(), yes. But they are also intended as a public API and documented (it's in model's API doc, too lazy to get a link :-) ).

comment:102 by Gábor Farkas <gabor@…>, 17 years ago

Cc: gabor@… added

comment:103 by David Cramer <dcramer@…>, 17 years ago

Cc: dcramer@… added

comment:104 by Nebojša Đorđević <nesh@…>, 17 years ago

Cc: nesh@… added; nesh@… removed

in reply to:  102 comment:105 by michell.jo@…, 17 years ago

Replying to Gábor Farkas <gabor@nekomancer.net>:

Sorry to clutter the thread.

Can this patch be used with the newforms framework, or does it only work with oldforms?

comment:106 by Øyvind Saltvik <oyvind.saltvik@…>, 17 years ago

Can be used in newforms if you handle the incoming data, form_for_model has no automatic filefields yet.

in reply to:  11 comment:107 by michell.jo@…, 17 years ago

Replying to adrian:

In reference to bnewbold's comment above, it's the shutil module, and it looks like it's been around since at least Python 2.3. (Django supports Python 2.3 and up.)

os.rename('old', 'new') does result in an 'invalid cross-link device' exception when 'old' and 'new' are on different filesystems. It doesn't make sense to do this, as once a large file has been streamed to disk, copying to another filesystem is a pointless additional overhead. However, the code could catch the error, and do a copy and then delete operation (and generate a warning that the system can be configured more efficiently).

Or it could use shutil as suggested by a previous poster.

comment:108 by michell.jo@…, 17 years ago

Is it possible that this patch could be updated so that it works with the current source version? At the moment if one checks out revision 4459, patches it and then executes svn update, clashes arise at several points in the code.

comment:109 by Brian Koebbe <koebbe@…>, 17 years ago

Cc: koebbe@… added

by axiak@…, 17 years ago

I've updated the patch to include the shutils command and to work with [5065]. Please check to see if it works.

by axiak@…, 17 years ago

Works in [5065], renamed settings variable, uses global settings, defaults STREAMING_MIN_POST_SIZE to .5MB (please test though!)

comment:110 by Michael Axiak <axiak@…>, 17 years ago

Cc: axiak@… added

by Øyvind Saltvik <oyvind@…>, 17 years ago

Updated to trunk, without changes

comment:111 by Øyvind Saltvik <oyvind@…>, 17 years ago

Axiak

But does this not limit streaming to gnu os'es by using shutils. Should it not try shutils first, dont import shutils at the top of the .py Then try rename, and finally just write from filecontent if that fails.

by Michael Axiak <axiak@…>, 17 years ago

Usese shutils, but falls back.

by Michael Axiak <axiak@…>, 17 years ago

Usese shutils, but falls back, this time it deletes the tmp file even when it falls back.

in reply to:  111 comment:112 by Michael Axiak <axiak@…>, 17 years ago

Replying to Øyvind Saltvik <oyvind@saltvik.no>:

Axiak

But does this not limit streaming to gnu os'es by using shutils. Should it not try shutils first, dont import shutils at the top of the .py Then try rename, and finally just write from filecontent if that fails.

I did not know it only worked in GNU os's. Also, AFAIK, there's no OS-independent way to copy files in python, and if file streaming was enabled, file_fieldcontent will not have the desired content. Thus, the only sure-fire way to make it work is to do a file.write file.read combination. Luckily it probably won't have to do this, since it will try both shutils and os.rename before it gets to that point.

comment:113 by Øyvind Saltvik <oyvind@…>, 17 years ago

One more thing, the last fallback should read and write incrementally, to avoid reading the whole file before it is written to another.

by Michael Axiak <axiak@…>, 17 years ago

This file uses python fallbacks, but using chunks to avoid loading the entire file into memory.

by Michael Axiak <axiak@…>, 17 years ago

Cleaned it up a bit. Moved file_move_safe into django.utils in case it should be used in future endeavors.

comment:114 by Michael Axiak <axiak@…>, 17 years ago

Needs documentation: unset
Summary: [patch] Large streaming uploadsLarge streaming uploads
Version: SVN

This last patch seems clean. If people wanted to start testing, now would be a good time :). It works in [5079] and should work on all platforms while considering all the ideas posted here.
It also has documentation. I will not remove the `Patch needs improvement' flag to let others review this patch with a finer comb than I have.

Please post here if you have additional concerns, comments, or enhancements.

comment:115 by Joakim Sernbrant <serbaut@…>, 17 years ago

The raise at http/init__.py:114 looks broken, else: missing.

comment:116 by anonymous, 17 years ago

How to use x-progress-bar ??

by Øyvind Saltvik <oyvind@…>, 17 years ago

Added missing else

by Øyvind Saltvik <oyvind@…>, 17 years ago

Forgot to add django/utils/file.py

comment:117 by anonymous, 17 years ago

Hi,

Is x-progress-bar is a header that come with apache only or it is a start header?

Is it work on fcgi, if not, can I change x-progress-bar to hidden form?

Thank.

comment:118 by Øyvind Saltvik <oyvind@…>, 17 years ago

It is just a header that has to be sent along with the request.

X-Progress-ID with a value

then FILE_UPLOAD_DIR + the value of the header X-Progress-ID is a pickled python object.

That can be used by any view and displayed using javascript.

comment:119 by anonymous, 17 years ago

Thank for your replying,

I have tested this code by modified old javascript (middleware enable). I never seen X-Progress-ID in header, so, a temp file with that id never created.

Can I have example usage?

Thank again

comment:120 by Øyvind Saltvik <oyvind@…>, 17 years ago

Actually, this patch does this wrong, the header is only used for getting progress status, the id should be the querystring.

comment:121 by Michael Axiak <axiak@…>, 17 years ago

As this site states:
http://search.cpan.org/~ceeshek/Apache2-UploadProgress-0.2/lib/Apache2/UploadProgress.pm#HANDLERS

We should be using multiple mechanisms to get this progress ID.
After I mull over this some more and understand it better, I'll make a new patch.

by Michael Axiak <axiak@…>, 17 years ago

Uses multiple mechanisms for determining the progress id.

comment:122 by Michael Axiak <axiak@…>, 17 years ago

The patch has now been merged into [5099] and I added a new property to request: progress_id. This may simplify JavaScript Middleware (we should make a new ticket for that) as it only has to do request.progress_id. If it's None, there was no progress ID sent. Otherwise, there was.

comment:123 by Michael Axiak <axiak@…>, 17 years ago

This ticket is only for making sure file uploads work. I have created ticket #4165 which s specifically for the progress bar enhancements.

in reply to:  117 comment:124 by anonymous, 17 years ago

Replying to anonymous:

Hi,

Is x-progress-bar is a header that come with apache only or it is a start header?

Is it work on fcgi, if not, can I change x-progress-bar to hidden form?

Thank.

X-Progress-ID is used to tell the server between loads "This is the same post". I'm actually not sure how the headers are supposed to be populated, only that we're supposed to "support them". After reading a perl module for quite some time, I've determined that they use javascript to append a random 32-digit hex at the end of their form action ("?progress_id=abcdef0123456789abcdef0123456789") right before the form submits.

If you do this, it will work.

NB: Using a hidden POST field will NOT work. Why? Because it's the POST upload we're trying to track!

by Michael Axiak <axiak@…>, 17 years ago

This isn't important enough to get its own property in request...request.META now contains 'UPLOAD_PROGRESS_ID'

by Michael Axiak <axiak@…>, 17 years ago

I included a bit much in that last patch.

by Michael Axiak <axiak@…>, 17 years ago

Attachment: 5100_file_upload_core.diff added

Meant to be cleaner, especially in light of #4165

by Michael Axiak <axiak@…>, 17 years ago

Added middleware hooks

by Michael Axiak <axiak@…>, 17 years ago

Added middleware hooks...this is better.

in reply to:  27 comment:125 by anonymous, 17 years ago

Replying to Maniac <Maniac@SoftwareManiacs.Org>:

My idea of a hook is not fit into current middleware, I just wanted to retain the style. So to clarify I see it like this.

This can be a separate set of hooks similar to MIDDLEWARE_CLASSES or it can be done in middleware classes themselves. To do the latter we will specifiy that in addition to process_request, _response, _view and _exception methods a middleware can define a new kind of method: process_upload(request, data).

parse_file_upload then would search for these middleware methods just as http requests do in modpython.py and wsgi.py. The list of methods is passed to a FieldStorage decendant that will call them each time it's about to write a chunk of data to the target stream.

This is it... Sorry I can't come with the actual code: pretty busy rolling out my current project.

After reviewing all the comments, this one struck me in particular. I have added a process_upload(UploadException) method to MiddleWare.
I'm sure I might get heat for this recent addition, but I think it's the cleanest way to allow extensibility.

Essentially, you can create a descriptor
for the file_progress attribute of request. It gets treated like a dictionary so it will be passed things like {'received': 1414,'size': 100000} and is expected to return it when its asked for. You are given an exception to raise if something should cancel the file upload, so I could imagine someone writing a descriptor that would compare received and some target size and cancel the upload if it's more than that.

For more details, please look at django/http/file_descriptor.py for the fallback descriptor. If you named that exact class process_upload and stuck it in a middleware, it would be used (and would work). I'll come up with another example soon, and stick it in #4165.

by Michael Axiak <axiak@…>, 17 years ago

Apparently I don't know how not to lose files.

by Michael Axiak <axiak@…>, 17 years ago

Apparently I don't know how not to lose files.

by Michael Axiak <axiak@…>, 17 years ago

Fixes WSGI bug.

by Michael Axiak <axiak@…>, 17 years ago

Fixed bug where .append(0, func) was called.

by Michael Axiak <axiak@…>, 17 years ago

Fixed bug where .append(0, func) was called with the files this time.

comment:126 by anonymous, 17 years ago

Break!

Traceback (most recent call last):

File "C:\Python25\lib\site-packages\django\core\servers\basehttp.py", line 272, in run

self.result = application(self.environ, self.start_response)

File "C:\Python25\lib\site-packages\django\core\servers\basehttp.py", line 614, in call

return self.application(environ, start_response)

File "C:\Python25\lib\site-packages\django\core\handlers\wsgi.py", line 207, in call

request = WSGIRequest(environ)

File "C:\Python25\lib\site-packages\django\core\handlers\wsgi.py", line 78, in init

self.METAUPLOAD_PROGRESS_ID = self._get_file_progress_id()

File "C:\Python25\lib\site-packages\django\core\handlers\wsgi.py", line 187, in _get_file_progress_id

self._req.args)

AttributeError: 'WSGIRequest' object has no attribute '_req'

by Michael Axiak <axiak@…>, 17 years ago

Tested with WSGI and made a few changes.

comment:127 by Michael Axiak <axiak@…>, 17 years ago

Alright, I tested/tweaked the patch on a few different platforms. Here are my notes:

  • Apache2/mod_python: Works well. Works with #4165 and generates pretty progress bar
  • Apache2/mod_fastcgi: Works well. Works with #4165 and generates pretty progress bar
  • Apache2/mod_scgi: Untested.
  • LighTTPD/mod_fastcgi: Does NOT work. However, version 1.5 will have mod_uploadprogress which will provide a JSON interface just like the one in #4165. Thus, using mod_upgradeprogress to generate a progress bar should be a breeze. I don't have lighttpd 1.5 to test, though.
  • manage.py development server: World well. Progress bar does not work AFAIK, the development server is singlethreaded and it will stay that way. The progress bar is an inherently multithreaded process (one to serve progress information, one to receive the uploaded information).

Let me know if you have anymore problems.

by Michael Axiak <axiak@…>, 17 years ago

Added 'state':'starting' to be more like mod_uploadprogress.

comment:128 by anonymous, 17 years ago

Hi,

I have tested under window xp, apache 2.2.4, mod_python 3.3. When I upload a large file via admin interface.

A temp file with named by session generated by javascript client is created. However, a file with random name and '.upload' extension is also created. File stream is append to '.upload' version instead of session id one.

Any idea??

in reply to:  128 comment:129 by Michael Axiak <axiak@…>, 17 years ago

Replying to anonymous:

Hi,

I have tested under window xp, apache 2.2.4, mod_python 3.3. When I upload a large file via admin interface.

A temp file with named by session generated by javascript client is created. However, a file with random name and '.upload' extension is also created. File stream is append to '.upload' version instead of session id one.

Any idea??

Yes, this is how it is supposed to work, which is why there are hooks for middleware (see illustration middleware in #4165).

You see, we don't need to keep track of the actual file that's being uploaded, because it's always being uploaded (by the same thread, same loop, etc.). We do need to be able to look for the 'progress' of that file upload, which has to be stored in some persistent manner, keyed by the session id. Since the file upload code has no model, can't rely on cache, and does have a safe place to put temp files, it by default puts that state information in that location.

If you install the patch 5126_file_upload_contrib_app.diff and add 'django.contrib.uploadprogress.middleware.FileProgressCached' to your MIDDLEWARE_CLASSES setting, the cache framework will be used, and no session_id file will be created.

Alternatively, you can use that patch and add 'django.contrib.uploadprogress.middleware.FileProgressDB' to your MIDDLEWARE_CLASSES and add 'django.contrib.uploadprogress' to your INSTALLED_APPS (and run a syncdb...).

Or, you are welcome to look at the code in django.contrib.uploadprogress.middleware.FileProgressCached and write your own method of storing this file progress information (it's in a dictionary format).

Sounds like it's working though, thanks for the feedback.

comment:130 by Michael Axiak <axiak@…>, 17 years ago

I should also explain the rationale for the naming of the files:

We use the X-Progress-ID value for the name of the file that stores the progress because in subsequent requests we're going to need to get the file by that same value.

We use a (python) temporary file name for the actual file that's being uploaded because it guarantees uniqueness. Since X-Progress-ID is provided by the client, we can't guarantee someone won't start auto-generating requests and cause a collision on the files themselves. I have thought of putting the X-Progress-ID in the session. This can be done but it relies on the session app -- which is not a prereq for Django to work -- being activated and file uploading is a core feature.

It doesn't seem like too much of an issue if the progress tracking has a small chance of getting munged. The actual files, OTOH, would be a bit worrisome.

That being said, if you really don't like the naming, and have a better suggestion, by all means suggest it.

in reply to:  127 ; comment:131 by anonymous, 17 years ago

Replying to Michael Axiak <axiak@mit.edu>:

Alright, I tested/tweaked the patch on a few different platforms. Here are my notes:
...

My new report:

  • Apache2/mod_python: Works well. Works with #4165 and generates pretty progress bar
  • Apache2/mod_fastcgi: Works well. Works with #4165 and generates pretty progress bar
  • Apache2/mod_scgi: Untested.
  • LighTTPD/mod_fastcgi: I stand corrected (Thanks SmileyChris). It does help with LighTTPD inasmuch as it will make the interaction between lighttpd and python not eat all the memory and CPU. However, it will not show a progress bar (nor are other features in #4165 supported), as the streamed 'upload' only exists between lighttpd and python, not between the web client and the web server.
  • manage.py development server: World well. Progress bar does not work AFAIK, the development server is singlethreaded and it will stay that way. The progress bar is an inherently multithreaded process (one to serve progress information, one to receive the uploaded information).

Also, when uploading a 4.5 GB file, the content length was reported incorrectly. This happened with both FastCGI and mod_python on apache2.

in reply to:  131 comment:132 by anonymous, 17 years ago

Replying to anonymous:

...

Also, when uploading a 4.5 GB file, the content length was reported incorrectly. This happened with both FastCGI and mod_python on apache2.

It turns out this happens when you don't have Large File support (on either the client or the server?).

by Chris Beaven, 17 years ago

There was an error uploading large files with streaming turned off

comment:133 by Chris Beaven, 17 years ago

I also tidied up the code a bit (just white space tweaking mostly).

I noticed that currently, you need to set an upload directory even if you are using a middleware which overrides the FileProgressDescriptor of request because of calls like this in http.__init__:

            if self._progress_filename:
                self._request.file_progress = {'state': 'done'}

I'm not sure if relying on _progress_filename is the best way to do this.

comment:134 by Malcolm Tredinnick, 17 years ago

Can we have a moratorium on uploading yet more changes to this ticket for a few days, please?

I've started reading through a few of the patches carefully with an eye to working out what parts look like they're ready to be committed. However, this isn't going to work if every day there is yet another slight tweak to something. It's going to take a few days to review all this carefully and think about all the issues raised in the comments, so everybody just take a deep breath and back away from the keyboards for a bit.

comment:135 by lukeman@…, 17 years ago

I've yet to test large files with the patch applied, but I've noticed my files are being created with 600 permissions instead of the 644 they received before patching. Not sure if it's just my setup, but I can't recall any other changes over the past day or so that would have affected it.

in reply to:  135 comment:136 by Michael Axiak <axiak@…>, 17 years ago

Replying to lukeman@gmail.com:

I've yet to test large files with the patch applied, but I've noticed my files are being created with 600 permissions instead of the 644 they received before patching. Not sure if it's just my setup, but I can't recall any other changes over the past day or so that would have affected it.

It should inherit the permissions from the temporary directory. I can't think of a better setup than this, but if people think this is unexpected/problematic maybe we can change it.

If people understand streaming files go to the temp folder then to the other folder, then I think inheriting the permissions from the temp folder would be expected behavior.

-Mike

comment:137 by anonymous, 17 years ago

I have tested multiple upload at the same time (difference browser instance). If there are more than two upload request, when a first request
is sent, it work properly. If I make a new call, an old stop immediately and a new one not create a session and temp file anymore.

Django is great in advance, please keep it simple and work in common task.

in reply to:  137 comment:138 by Michael Axiak <axiak@…>, 17 years ago

Replying to anonymous:

I have tested multiple upload at the same time (difference browser instance). If there are more than two upload request, when a first request
is sent, it work properly. If I make a new call, an old stop immediately and a new one not create a session and temp file anymore.

Django is great in advance, please keep it simple and work in common task.

Hi,

I have tested it with simultaneous file uploads and it worked fine. Could you please include more details? For instance, I don't think there would be any chance it would work in the development server. It may break in some FastCGI configurations, too. (The reason being that accepting multiple file uploads requires some multithreadedness.)

-Mike

comment:139 by anonymous, 17 years ago

Cc: rudolph.froger@… added

comment:140 by anonymous, 17 years ago

Hi

If I stop upload process by clicking a "Stop" button. It look like sever see a same upload request and never generate process id.

How to solve this problem?

Thank

comment:141 by anonymous, 17 years ago

Cc: jm.bugtracking@… added

in reply to:  140 comment:142 by Michael Axiak <axiak@…>, 17 years ago

Replying to anonymous:

Hi

If I stop upload process by clicking a "Stop" button. It look like sever see a same upload request and never generate process id.

How to solve this problem?

Thank

I'm not sure why you'd be seeing this. If you're using #4165 then the javascript should create a new progress id on *every* upload. If you're not using #4165, then progress id is irrelevant. While testing I was able to stop and re-upload in FF 2, 1.5, IE 7, and IE 6 without a problem. Could you please describe your environment with more detail?

-Mike

by Michael Axiak <axiak@…>, 17 years ago

Removed some unneeded things. No file progress tracking by default.

comment:143 by Michael Axiak <axiak@…>, 17 years ago

I've cleaned the FileProgressDescriptor architecture a little bit. Three changes for this patch:

  • using request.__class__ = descriptor rather than using a weird descriptor wrapper.
  • defaulting to a NullFileProgressDescriptor (which does nothing) versus one that stored the information in temp upload directory.
  • making safe_file_move do nothing if the source and destination are the same string.

If you want to track file upload progress you need to install the stuff from #4165. This ticket just contains the hooks to get #4165 there with just a minimal of contrib. This patch makes that even clearer by removing the last remnants of functional file progress tracking.

I *think* this is probably where we want to leave this ticket (in terms of what should be cut out of it, what should be put in). Although, I can see several other separation (take care of form stuff in another ticket, for instance).

It is worth noting (because someone had issue with this in IRC), that if you try to use this save_FIELD_file(name, content) should now be given a request.FILE item, rather than the actual content of the file.

For example:

    object.save_video_file(request.FILE['video']['filename'], request.FILE['video']['content'])

is bad, since the mere invocation request.FILE['video']['content'] will load all the content into memory, exactly what we're trying to avoid.

The new version looks like:

    object.save_video_file(request.FILE['video']['filename'], request.FILE['video'])

In reply to SmileyChris, we have to have an upload directory anyway, since the FileProgressDescriptor only changes how file *progress* is tracked, not the actual uploading of files (that is, it has to go into a directory, or else this entire thing is disabled). We could rename and clean up if people prefer it changed.

in reply to:  143 ; comment:144 by anonymous, 17 years ago

Replying to Michael Axiak <axiak@mit.edu>:

The new version looks like:

    object.save_video_file(request.FILE['video']['filename'], request.FILE['video'])

Why not skip request.FILEvideofilename at all? This is provided by request.FILEvideo anyway.
IMHO having filename as an separate parameter makes only sense if this would be used somehow directly, but currently the filename/path where the file gets saved is changed by the field (field.get_directory_name() and field.get_filename(filename)).

in reply to:  144 comment:145 by Michael Axiak <axiak@…>, 17 years ago

Replying to anonymous:

Why not skip request.FILEvideofilename at all? This is provided by request.FILEvideo anyway.
IMHO having filename as an separate parameter makes only sense if this would be used somehow directly, but currently the filename/path where the file gets saved is changed by the field (field.get_directory_name() and field.get_filename(filename)).

Well, save_video_file() is meant to allow someone to save a file. Unless that file is specifically provided in a post (it's not necessarily, and in many of my applications it's not...generated images come to mind).

We *can* have it accept both sets of arguments, but then it's too nonorthogonal. Perhaps we should just create a new one? move_video_file()??

by Michael Axiak <axiak@…>, 17 years ago

Newer, cleaner version of file upload script

by Michael Axiak <axiak@…>, 17 years ago

Newer, cleaner version of file upload script (doh! no random commented assert)

by Michael Axiak <axiak@…>, 17 years ago

Sorry about that -- this one is the better one.

comment:146 by Michael Axiak <axiak@…>, 17 years ago

A list of changes in this new version:

  • I removed the file_progress descriptor handling and middleware -- if you need to track file progress, you can use process_request to achieve what you need.
  • I cleaned up a lot of things, removed many catch-all excepts and replaced with except OSError etc.
  • Included cross-platform locking for saving/moving files manually (django/utils/file_locks.py)
  • Made move_file_safe(src,dest) to care if a file already exists and throw an error if one does by default.
  • Moved stuff out of http/__init__.py into various locations.
  • In reply to SmileyChris' comment earlier, no longer uses self._progress_file to determine if a file is being tracked. Now using self._track_progress (a boolean)
  • save_field_file can now optionally not take a filename (if filename = None). To avoid confusion, there is an additional function, instance.move_field_file which takes one parameter: the request.FILES value. Code examples should use move_field_file to distinguish this new functionality.
  • Stay tuned for updates to #4165 to reflect these and other changes!

Example of new move_video_file Model instance function:

#from this
foo.save_video_file(request.FILES['video']['filename'], request.FILES['video'])

#to this
foo.move_video_file(request.FILES['video'])

by Michael Axiak <axiak@…>, 17 years ago

It's amazing what Trac helps you see.

by Michael Axiak <axiak@…>, 17 years ago

Made a subtle fix after testing with #4165.

comment:147 by anonymous, 17 years ago

def init(self, headers, input, request, file_upload_dir=None, streaming_min_post_size=None, chunk_size=1024*64):

parser = MultiPartParser(headers, input, request, file_upload_dir, streaming_min_post_size)

Is chunk_size is configurable? Do you hard code, aren't you.

Thank

in reply to:  147 comment:148 by Michael Axiak <axiak@…>, 17 years ago

Replying to anonymous:

def init(self, headers, input, request, file_upload_dir=None, streaming_min_post_size=None, chunk_size=1024*64):

parser = MultiPartParser(headers, input, request, file_upload_dir, streaming_min_post_size)

Is chunk_size is configurable? Do you hard code, aren't you.

Thank

Actually, that's a great point. I am going to change a few things (shift around save_FIELD_file and move_FIELD_file to make things more explicit) and on my list for the next upload would be to allow chunk sizes to be a settings variable. (I was actually discussing this with Øyvind in IRC, http://simon.bofh.ms/logger/django/2007/05/26/15/25 .)

Thanks for the comment, though. Feel free to post any other suggestions!

comment:149 by bricetebbs@…, 17 years ago

Works for me. Windows/Apache2.2.4/ModPython.
My page actually uploads 4 large files at once so I added 'filename' : self._filename to the file_progress dict created in multparser.py then I can display the current file in my javascript.

comment:150 by anonymous, 17 years ago

Cc: oliver@… added

comment:151 by djangoproject.com@…, 17 years ago

5343_cleaned_streaming_file_upload_tweaked.diff works perfect for me (Linux/Apache 2.0.59/modPython). Uploading large zip files (10 to 100 MB each) to my photo gallery with no problems ... memory usage is dead straight at 65MB ... where as it explored to 600 MB per apache process before. Good work Michael!

comment:152 by anonymous, 17 years ago

111 def parse(self):
112 try:
113 self._parse()
114 finally:
115 if self._track_progress:
116 self._request.file_progress = {'state': 'done'}
117 return self._post, self._files

When user click 'stop' while uploading file, server return {'state':'done'} even file is not completely uploaded.
This cause unexpected behaviors under different browser. I think we should return actual file size and received size to make
client to be able to check if file is completely uploaded.

Thank

comment:153 by anonymous, 17 years ago

def parse(self):

try:

self._parse()

finally:

if self._progress_filename:

if self._received == self._size:

self._request.file_progress = {'received':self._received,

'size':self._size,
'state': 'done'}

else:

self._request.file_progress = {'received':self._received,

'size':self._size,
'state': 'abort'}

How about this?

comment:154 by anonymous, 17 years ago

Cc: moritz.angermann@… added

by klaus.blindert@…, 17 years ago

Fixed microscopic bug in an error path

comment:155 by simonbun <simonbun@…>, 17 years ago

Cc: django@… added

comment:156 by anonymous, 17 years ago

Cc: paltman@… added

by simonbun <simonbun@…>, 17 years ago

Attachment: 5722.diff added

Updated patch against r5722

by simonbun <simonbun@…>, 17 years ago

Attachment: 5722.2.diff added

Sorry, that last one patch also included #4165. This is the correct one.

by Brian Rosner <brosner@…>, 17 years ago

updated to r5724 (previous patch was missing files too)

comment:157 by Brian Rosner <brosner@…>, 17 years ago

Cc: brosner@… added

comment:158 by Simon G. <dev@…>, 17 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

Reading over this, I think the latest version here looks good (5724_streaming_file_upload.diff). Is everyone happy with it?

comment:159 by simonbun <simonbun@…>, 17 years ago

Yes, I've been using it intensively for the last couple of days and it works beautifully.

comment:160 by Brian Rosner <brosner@…>, 17 years ago

I have been using this patch for a little while now and it does work very well. Haven't seen anything wrong with it.

comment:161 by eibaan@…, 17 years ago

I manually applied patch 5724... to the latest trunk version and noticed that there's a syntax error in line 254 of multipartparser.py. Instead of

except OSError, IOError:

it must be

except (OSError, IOError), e:

Besides that, the patch seems to work quite nicely and I hope it will be added to trunk ASAP.

comment:162 by simonbun <simonbun@…>, 17 years ago

Ran into a very small issue:
Files uploads that are streamed end up having 0600 permissions, while files that are uploaded normally have 0644 permissions. I can't put my finger on where this happens at a glance, will look into it tomorrow.

(In case anyone's wondering, I need read access to be able to scan newly uploaded files with clamav.)

in reply to:  162 ; comment:163 by anonymous, 17 years ago

Please read 136. This is standard Unix permissions. For example, if someone (heaven forbid) used PHP to upload, the files would inherit the permissions from the temporary upload directory.

-Mike

Replying to simonbun <simonbun@versea.be>:

Ran into a very small issue:
Files uploads that are streamed end up having 0600 permissions, while files that are uploaded normally have 0644 permissions. I can't put my finger on where this happens at a glance, will look into it tomorrow.

(In case anyone's wondering, I need read access to be able to scan newly uploaded files with clamav.)

comment:164 by simonbun <simonbun@…>, 17 years ago

Well the strange thing is my temporary dir and my uploads dir both have 777 permissions. I just checked again and the temporary .upload file has 600 permissions.

I'm using apache2 + mod_python + python2.5 if that would make a difference

comment:165 by anonymous, 17 years ago

Cc: matthias@… added

in reply to:  163 comment:166 by Arthur Debert, 17 years ago

Replying to anonymous:

I'll second that. This one has bitten me too.
While setting permissions to 600 is sound, maybe a two liner on the documentation would help make that more obvious. It took me a while to understand what as going on.

Asides from that, thanks for the patch. I've been using it for a while (mod_python at webfaction's setup, post unicode merge) and it works well.

Cheers
Arthur

comment:167 by anonymous, 17 years ago

Cc: root.lastnode@… added

comment:168 by akaihola <antti.kaihola@…>, 17 years ago

Cc: akaihola+djtick@… added

by Brian Rosner <brosner@…>, 17 years ago

updated to r5820

comment:169 by anonymous, 17 years ago

Cc: sam@… added

comment:170 by Trey <trey@…>, 17 years ago

Cc: trey@… added

comment:171 by anonymous, 17 years ago

Cc: klaus.blindert@… added

comment:172 by Øyvind Saltvik <oyvind@…>, 17 years ago

Patch for r5820 does not work with forms.FileField and forms.ImageField, raw_field is now a StrAndUnicode object not a dict containing the raw data.

comment:173 by Øyvind Saltvik <oyvind@…>, 17 years ago

comment:174 by Michael Axiak, 17 years ago

Keywords: sprint added
Owner: changed from nobody to Michael Axiak
Status: assignednew
Triage Stage: Ready for checkinAccepted

I will take this on to as full extent as I can over the upcoming sprint.
If anyone has any more issues, please post them on here before Friday.

comment:175 by anonymous, 17 years ago

I would like to see the filename be part of the progress returned for when a form has multiple files:

self._request.file_progress = {'received': self._received,
                                'size':     self._size,
                                'filename' : self._filename,
                                'state':    'uploading'}

comment:176 by Matthias Urlichs <smurf@…>, 17 years ago

The checks after an image upload (as seen in http://dpaste.com/hold/18468/) can be improved somewhat; at first glance it is entirely unclear what's supposed to happen, and the error message is repeated.

See http://dpaste.com/hold/19212/ (untested but looks OK).

comment:177 by klaus.blindert@…, 17 years ago

Uploading empty files is treated as an error, why?
I would also like to see a file propgress tracker example implementation with unit tests and a stress test for uploading
large files.

On request I can provide an example of such a stress test programm.

comment:178 by Matthias Urlichs <smurf@…>, 17 years ago

I frankly fail to see the use case for uploading an empty file (as opposed to simply not uploading it in the first place).

If you have one, feel free to (a) tell us why you think it makes sense, (b) provide a patch which optionally allows that.

in reply to:  178 comment:179 by anonymous, 17 years ago

Replying to Matthias Urlichs <smurf@smurf.noris.de>:

I frankly fail to see the use case for uploading an empty file (as opposed to simply not uploading it in the first place).

There are several situations where one might want to upload empty files.

  1. It could be that one does not know the file is empty
  2. Maybe one actually needs to have an empty file, for example a ini.py in python indicates that a directory is a package

There is nothing special about an empty file, it is a file like any other file, it only contains a single character, the End Of File character. Why would you want to treat this file in any different way? If you do so it only means that your code is not generic enough.

One more note:

The reason that this ticket is languishing like this is that it is filled with tacit assumptions, unusual behavior (like the one I'm commenting on) and extra functionality that is not relevant to the actual problem it needs to solve first - streaming the uploads.

comment:180 by Matthias Urlichs <smurf@…>, 17 years ago

I'd argue that case 1 is a prime candidate for keeping the test, while case 2 can be handled by uploading a file with a comment in it.

In any case: if you think that's the reason, then please post either a list of (other) problems you have with this patch, or an improved patch.

Anythig else is not helpful. IMHO.

comment:181 by Philippe Raoult, 17 years ago

Keywords: sprintsept14 feature added; sprint removed

comment:182 by Ville Säävuori, 17 years ago

Cc: ville@… added

in reply to:  180 comment:183 by anonymous, 17 years ago

Replying to Matthias Urlichs <smurf@smurf.noris.de>:

while case 2 can be handled by uploading a file with a comment in it.

I would be willing to elaborate but this commment of yours indicates to me you have your mind set and don't actually want to understand the point that I was making: That is sometimes a file have to be empty.

You can't manually start adding bytes into files just because someone decided that uploading an empty file is an error. All you need is think about it for a second. As I said before an empty file is still a valid file, it has a name, a modification date ... even a size!. Maybe it is used to indicate a state, maybe it is log file that has yet to be written to etc. there are countless usecases that require the presence of a file that may not contain anything yet.

In your design if one were to simultaneously upload 10 files one of which is empty, the upload should fail once processing reaches the empty file ...

comment:184 by Matthias Urlichs, 17 years ago

I do admit that you have a point; I admit that my response was a bit knee-jerk.

My core point, now that I think about it, is not that I'm dead set against uploading empty files. That test is easily removed, or made optional.

The real point is: What's holding up the change? Did axiak just not have enough time?

This is not the only patch in this bug tracker which fixes a real-world problem. We're carrying ~30 of these in our own repository and, frankly, I'd like that number to go down, one of these days. :-/

comment:185 by oliver@…, 17 years ago

I know this is maybe a little late to chime in, but as opinion on this seems to be heavily divided as to whether empty files should be allowed, would it not just be best all-round to add a way to define whether empty files should be allowed or not?

Surely this wouldn't be too hard (I'd be willing to work on this if I get a few moments here-and-there).

Just a thought'''

comment:186 by Trey <trey@…>, 17 years ago

I am going to have to go with +1 for not distinguishing between an empty file or a non-empty file.

Technically there is no difference between and empty and non-empty file so I don't see why django should distinguish between the two. It's definitely a user defined preference at that point, which should be possible after the file has been uploaded. Not wanting and empty file is an assumption, a popular one yes, but still an assumption about the upload.

comment:187 by Michael Axiak <axiak@…>, 17 years ago

Hey,

As well as Trey, I am also +1 for not making this distinction. If people need the validation, it belongs in another layer than the core http processing. (Perhaps the newforms filefield)

-Mike

comment:188 by Sung-jin Hong, 17 years ago

I too +1 for not making that distinction.

-SJ

comment:189 by Trey <trey@…>, 16 years ago

What do we need to do to get some movement on this patch?

comment:190 by Øyvind Saltvik <oyvind@…>, 16 years ago

Make it work with current trunk first

comment:191 by Reza Mohammadi, 16 years ago

Cc: reza@… added

comment:192 by Faheem Mitha <faheem@…>, 16 years ago

I'm getting

Ran 155 tests in 55.010s

FAILED (failures=1, errors=1)

when I run the tests on 5820 with 5820_streaming_file_upload.diff applied.

They look like

1) Traceback (most recent call last):

File "/usr/local/src/django/django-trunk.hg.5820+streaming/tests/modeltests/test_client/models.py", line 91, in test_post_file_view

response = self.client.post('/test_client/post_file_view/', post_data)

File "/usr/local/lib/python2.4/site-packages/django/test/client.py", line 222, in post

return self.request(r)

File "/usr/local/lib/python2.4/site-packages/django/core/handlers/base.py", line 77, in get_response

response = callback(request, *callback_args, callback_kwargs)

File "/usr/local/src/django/django-trunk.hg.5820+streaming/tests/modeltests/test_client/views.py", line 52, in post_file_view

c = Context({'file': request.FILESfile_file})

File "/usr/local/lib/python2.4/site-packages/django/utils/datastructures.py", line 143, in getitem

raise MultiValueDictKeyError, "Key %r not found in %r" % (key, self)

MultiValueDictKeyError: 'Key \'file_file\' not found in <MultiValueDict: {\'file\': [{\'content\': \'# coding: utf-8
n"""
n38. Testing usi
[...]

2) FAIL: test_simple_upload (regressiontests.test_client_regress.models.AssertFileUploadTests)

Traceback (most recent call last):

File "/usr/local/src/django/django-trunk.hg.5820+streaming/tests/regressiontests/test_client_regress/models.py", line 215, in test_simple_upload

self.assertEqual(response.status_code, 200)

AssertionError: 500 != 200

Am I doing something wrong here?

by Faheem Mitha <faheem@…>, 16 years ago

comment:193 by Faheem Mitha <faheem@…>, 16 years ago

Uploaded new diff 6469_streaming_file_upload.diff, applying against trunk.

Currently one unit test is failing, traceback follows. There were two tests failing in the earlier version (5820), one of which was this one, but [560] helped me fix one of them on IRC.

[560] thinks this is an unicode error. If this version works for people, I'd like to hear about it. It would really nice to get this merged into trunk.

Thanks, Faheem.

*
FAIL: test_simple_upload (regressiontests.test_client_regress.models.FileUploadTests)


Traceback (most recent call last):

File "/usr/local/src/django/django-trunk.hg+mystreaming/tests/regressiontests/test_client_regress/models.py", line 244, in test_simple_upload

self.assertEqual(response.status_code, 200)

AssertionError: 500 != 200


Ran 205 tests in 85.939s

FAILED (failures=1)

comment:194 by anonymous, 16 years ago

Okay, so if I want to test this with a clean trunk, which of the above do I use? Trying to figure out if I need the middleware (if so which one), or just 6469 or what? I am needed to upload a 500M+ file on a weekly basis and would love to know HOW to use this.

in reply to:  194 ; comment:195 by Faheem Mitha <faheem@…>, 16 years ago

Replying to anonymous:

Okay, so if I want to test this with a clean trunk, which of the above do I use? Trying to > figure out if I need the middleware (if so which one), or just 6469 or what? I am needed to > upload a 500M+ file on a weekly basis and would love to know HOW to use this.

6469_streaming_file_upload.diff should apply cleanly against trunk (hasn't been that long). Just tested it (patched against 6469 trunk) and it appears to work fine. You don't need anything else if you just want the streaming upload functionality. Note that apparently there are some issues with FileField/ImageField in forms/newforms, but I have no direct experience of this.

Would be interested to hear how it works for you.

Faheem.

in reply to:  195 comment:196 by anonymous, 16 years ago

Isn't that just the way. I wrote this post, re-re-rechecked out the trunk and applied the patch. Works marvelously. Combined with a crontab for file conversion and everything is working wonderfully.

Thanks!

Replying to Faheem Mitha <faheem@email.unc.edu>:

Replying to anonymous:

Okay, so if I want to test this with a clean trunk, which of the above do I use? Trying to > figure out if I need the middleware (if so which one), or just 6469 or what? I am needed to > upload a 500M+ file on a weekly basis and would love to know HOW to use this.

6469_streaming_file_upload.diff should apply cleanly against trunk (hasn't been that long). Just tested it (patched against 6469 trunk) and it appears to work fine. You don't need anything else if you just want the streaming upload functionality. Note that apparently there are some issues with FileField/ImageField in forms/newforms, but I have no direct experience of this.

Would be interested to hear how it works for you.

Faheem.

comment:197 by eibaan, 16 years ago

I noticed a problem with an older patch that seems to be still in 6469. Uploading a FORM with both a file and text INPUT and the input text containing umlauts or other non-ascii characters, you get an exception. I fixed it by replacing line 272 of multipartparser.py with self._post.appendlist(self._fieldname, unicode(self._field.getvalue(), "utf-8")).

by Øyvind Saltvik <oyvind@…>, 16 years ago

Attachment: 6525_all_tests_pass.diff added

Fixed unicode in POST issues, added django.http.utils, moved str_to_unicode there

comment:198 by Øyvind Saltvik <oyvind@…>, 16 years ago

Added a new patch, fixed unicode in POST, moved str_to_unicode

by Øyvind Saltvik <oyvind@…>, 16 years ago

Fixed a problem with UploadedFile wrapper and making sure content is not read in Fie/ImageField

comment:199 by Øyvind Saltvik <oyvind@…>, 16 years ago

Needs tests: set

Everything seems to work now, just needs some form tests.

comment:200 by Evgenii Morozov, 16 years ago

I've installed the latest trunk, applied this patch (6603_all_tests_pass_uploadedfile_wrapper_fixed.diff) and discovered two problems. Under mod_python 3.3.1 uploading 12Mb image results in apache consuming all available memory and dying. This is probably a mod_python error as it doesn't occur when using development server or mod_wsgi.

When I'm trying to upload 600Mb file under mod_wsgi, though, I get another problem:

Traceback (most recent call last):
File "/var/lib/python-support/python2.5/django/core/handlers/base.py" in _real_get_response
  81. response = callback(request, *callback_args, **callback_kwargs)
File "/var/lib/python-support/python2.5/django/contrib/admin/views/decorators.py" in _checklogin
  55. return view_func(request, *args, **kwargs)
File "/var/lib/python-support/python2.5/django/views/decorators/cache.py" in _wrapped_view_func
  39. response = view_func(request, *args, **kwargs)
File "/var/lib/python-support/python2.5/django/contrib/admin/views/main.py" in change_stage
  332. errors = manipulator.get_validation_errors(new_data)
File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in get_validation_errors
  61. errors.update(field.get_validation_errors(new_data))
File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in get_validation_errors
  378. self.run_validator(new_data, validator)
File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in run_validator
  368. validator(new_data.get(self.field_name, ''), new_data)
File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in isValidImage
  713. validators.isValidImage(field_data, all_data)
File "/var/lib/python-support/python2.5/django/core/validators.py" in isValidImage
  180. content = field_data['content']

  KeyError at /admin/account/pokerroom/5/
  'content'

Seems that patch is already outdated. Where can I get a newer version?

Eugene

comment:201 by Evgenii Morozov, 16 years ago

Cc: registrations@… added

comment:202 by Evgenii Morozov, 16 years ago

BTW, it seems that problem was caused by insufficient space on /tmp partition I'm using for uploads. Here's additional info:

POST
Variable 	Value
_file_upload_error 	u'Failed to write to temporary file.'

It's a bug anyways.

by Øyvind Saltvik <oyvind@…>, 16 years ago

made the isValidImage validator try the tempfile before reading content

comment:203 by Faheem Mitha <faheem@…>, 16 years ago

Hi Oyvind,

Just wondering what problem your most recent patch fixes. I tried using newforms and the patch breaks with it. (See traceback below), Does your patch fix this problem? I've heard of some problems related to newforms. Is this what it is?

Thanks, Faheem.

*
from models import FileUpload, FolderUpload
from django.newforms import form_for_model

FileUploadForm = form_for_model(FileUpload)
FolderUploadForm = form_for_model(FolderUpload)
*

*

Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/core/handlers/base.py" in _real_get_response

  1. response = callback(request, *callback_args, callback_kwargs)

File "/usr/local/lib/python2.4/site-packages/django/contrib/auth/decorators.py" in _checklogin

  1. return view_func(request, *args, kwargs)

File "/usr/local/lib/python2.4/site-packages/bixfile_dev/views.py" in file_upload

  1. success = form.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py" in save

  1. return save_instance(self, model(), fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py" in save_instance

  1. f.save_form_data(instance, cleaned_data[f.name])

File "/usr/local/lib/python2.4/site-packages/django/db/models/fields/init.py" in save_form_data

  1. getattr(instance, "save_%s_file" % self.name)(data.filename, data.content, save=False)

File "/usr/local/lib/python2.4/site-packages/django/db/models/fields/init.py" in

  1. setattr(cls, 'save_%s_file' % self.name, lambda instance, filename, raw_field, save=True: instance._save_FIELD_file(self,

filename, raw_field, save))
File "/usr/local/lib/python2.4/site-packages/django/db/models/base.py" in _save_FIELD_file

  1. if raw_field.has_key('tmpfilename'):

AttributeError at /proto_dev/bixfile_dev/
'str' object has no attribute 'has_key'

comment:204 by Øyvind Saltvik <oyvind@…>, 16 years ago

Just changed the oldforms image validator to use the tempfile. The problem you have was fixed in the 6603 patch.

comment:205 by Faheem Mitha <faheem@…>, 16 years ago

Hi oyvind,

I'm getting some test breakage with your patch.

Ran 211 tests in 90.369s

FAILED (failures=3)

The most common error is:

AttributeError: 'UploadedFile' object has no attribute 'content'.

by Øyvind Saltvik <oyvind@…>, 16 years ago

diff did not apply cleanly, fixed

comment:206 by Øyvind Saltvik <oyvind@…>, 16 years ago

All tests run fine now

comment:207 by Øyvind Saltvik <oyvind@…>, 16 years ago

Django patched with the latest streaming patch can now be checked out using mercurial.

hg clone http://streaming.cylon.no/streaming django_src

comment:208 by Bastian Kleineidam <calvin@…>, 16 years ago

Cc: calvin@… added

comment:209 by anonymous, 16 years ago

will this get into trunk any time soon?

comment:210 by anonymous, 16 years ago

6654_fixed_tests_and_file_clean.diff fails against 6671.

comment:211 by anonymous, 16 years ago

Cc: rudolph.froger@… removed

comment:212 by anonymous, 16 years ago

Cc: works@… added

comment:213 by Andrew Badr <andrewbadr.etc@…>, 16 years ago

Cc: andrewbadr.etc@… added

comment:214 by Joshua Works, 16 years ago

Could someone post updated directions to get the latest incarnation of this patch working? I'm using Oyvind's mercurial source, but I'm not sure if I still need to modify the middleware, change my models at all, add settings to settings.py, etc.

Also, is the progress meter still included in this patch, and what do I need to see it in the Admin?

There hasn't been much activity here lately -- is this thing close to making it into the trunk and getting some proper documentation?

in reply to:  214 comment:215 by anonymous, 16 years ago

Replying to Joshua Works:

Could someone post updated directions to get the latest incarnation of this patch working? I'm using Oyvind's mercurial source, but I'm not sure if I still need to modify the middleware, change my models at all, add settings to settings.py, etc.

It works out of the box for me, at least it does for the last version I used it for, which corresponds to Oyvinds last patch. However, apparently this patch does not apply cleanly to current trunk. Oyvind asked me to do a merge, but I've been busy with other things. I'll do my best to get to this in the next day or two.

Also, is the progress meter still included in this patch, and what do I need to see it in the Admin?

No, that is in another patch http://code.djangoproject.com/ticket/4165, and is buggy.
See my comments there. Apart from a backend problem with the Python code, the JS code has
problems too. It should be easy to rewrite for someone who knows JS, though. I may end up doing it myself if nobody else does, but I'll probably use a library like PrototypeJS or Jquery.

There hasn't been much activity here lately -- is this thing close to making it into the trunk and getting some proper documentation?

I think it will need a bunch of people to give it a push. I've been intending to try and coordinate something, but I've been busy with other things.

If you don't hear anything, feel free to ping me via private email. I'm quite keen that the patch continue working till it is merged into trunk. I need it for some stuff myself.

Faheem.

comment:216 by Malcolm Tredinnick, 16 years ago

It absolutely doesn't need "a bunch of people to give it a push". We know this ticket is here and at some point one of the maintainers will sit down and review it again to see if all the problems from the last review have been fixed and no new ones introduced.

by Faheem Mitha <faheem@…>, 16 years ago

Attachment: streaming.6710.patch added

Updated streaming patch to trunk rev 6710.

comment:217 by Faheem Mitha <faheem@…>, 16 years ago

Added streaming.6710.patch against trunk rev 6710. This passes all Django tests. I haven't tested it extensively, but it appears to work. Some quick notes on how I generated this patch in case anyone is interested. I used mercurial and kdiff3 for merging (OS Debian etch).

Note that the only file that required manual intervention was django/http/utils.py. This file could probably use a little cleanup.
Everything else was handled by kdiff3.

See Section 12.8 of the hgbook for a recipe for doing this http://hgbook.red-bean.com/hgbookch12.html#x16-28700012.8 and also this wiki page,
http://www.selenic.com/mercurial/wiki/index.cgi/MqMerge

Note that the method described in the wiki is currently less likely to give problems than the method in the hgbook (otherwise similar).

See this issue for further discussion, http://www.selenic.com/mercurial/bts/issue844

comment:218 by anonymous, 16 years ago

Successfully patched trunk rev 7018 with streaming.6710.patch. Works well.

What are the main issues blocking trunk integration? This functionality is of high importance to many, as one could argue that without this patch the current FileField and ImageField implementation is both buggy and exposes a security risk (denial of service by uploading large files as Apache is eating up memory and peaking CPU usage + large file upload plainly fails in the end without the patch).

Thus strong +1 for integration ASAP.

comment:219 by ipse <ipse.reg@…>, 16 years ago

Cc: ipse.reg@… added

comment:220 by Soeren Sonnenburg <bugreports@…>, 16 years ago

I also urgently need that feature ... any chance to see this merged soon?

comment:221 by James Bennett, 16 years ago

Please do not ask "when will this be merged" questions in the ticket tracker; all it does is generate noise. If you're genuinely interested in helping to get a specific patch merged, consider asking on the developers list about why it hasn't yet been committed or, better yet, searching the archive of that list to see if there's a reason given.

comment:222 by anonymous, 16 years ago

Cc: uptimebox@… added

by egon, 16 years ago

Attachment: streaming.7092.patch added

Made it apply cleanly to rev. 7092

comment:223 by Faheem Mitha, 16 years ago

streaming.7092.patch gives some errors. As usual, I don't understand
what is going on well enough to fix them. See below.

*
======================================================================
FAIL: Doctest: modeltests.model_forms.models.test.API_TESTS


Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest

raise self.failureException(self.format_failure(new.getvalue()))

AssertionError: Failed doctest test for modeltests.model_forms.models.test.API_TESTS

File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line unknown line number, in API_TESTS


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

type(f.cleaned_datafile)

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[198]>", line 1, in ?

type(f.cleaned_datafile)

AttributeError: 'TextFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[199]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.file

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[200]>", line 1, in ?

instance.file

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f = TextFileForm(data={'description': u'Assistance'}, instance=instance)

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[201]>", line 1, in ?

f = TextFileForm(data={'description': u'Assistance'}, instance=instance)

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.cleaned_datafile

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[203]>", line 1, in ?

f.cleaned_datafile

AttributeError: 'TextFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[204]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.file

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[205]>", line 1, in ?

instance.file

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

os.unlink(instance.get_file_filename())

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[206]>", line 1, in ?

os.unlink(instance.get_file_filename())

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f = TextFileForm(data={'description': u'Assistance'}, files={'file': {'filename': 'test2.txt', 'content': 'hello world'}}, instance=instance)

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[207]>", line 1, in ?

f = TextFileForm(data={'description': u'Assistance'}, files={'file': {'filename': 'test2.txt', 'content': 'hello world'}}, instance=instance)

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[209]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.file

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[210]>", line 1, in ?

instance.file

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.delete()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[211]>", line 1, in ?

instance.delete()

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[219]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be changed because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.file

Expected:

u'.../test3.txt'

Got:



File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

type(f.cleaned_dataimage)

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[226]>", line 1, in ?

type(f.cleaned_dataimage)

AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[227]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The ImageFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[228]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.cleaned_dataimage

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[231]>", line 1, in ?

f.cleaned_dataimage

AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[232]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[233]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

os.unlink(instance.get_image_filename())

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[234]>", line 1, in ?

os.unlink(instance.get_image_filename())

AttributeError: 'TextFile' object has no attribute 'get_image_filename'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[237]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[238]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.delete()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[239]>", line 1, in ?

instance.delete()

File "/usr/local/lib/python2.4/site-packages/django/db/models/base.py", line 324, in delete

assert self._get_pk_val() is not None, "%s object can't be deleted because its %s attribute is set to None." % (self._meta.object_name, self._meta.pk.attname)

AssertionError: TextFile object can't be deleted because its id attribute is set to None.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[247]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The ImageFile could not be changed because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Expected:

u'.../test3.png'

Got:


======================================================================
FAIL: Doctest: regressiontests.forms.tests.test.fields_tests


Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest

raise self.failureException(self.format_failure(new.getvalue()))

AssertionError: Failed doctest test for regressiontests.forms.tests.test.fields_tests

File "/usr/local/src/django/django-trunk.patched/tests/regressiontests/forms/tests.py", line unknown line number, in fields_tests


File "/usr/local/src/django/django-trunk.patched/tests/regressiontests/forms/tests.py", line ?, in regressiontests.forms.tests.test.fields_tests
Failed example:

type(f.clean({'filename': 'name', 'content': 'Some File Content'}, 'files/test4.pdf'))

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest regressiontests.forms.tests.test.fields_tests[278]>", line 1, in ?

type(f.clean({'filename': 'name', 'content': 'Some File Content'}, 'files/test4.pdf'))

File "/usr/local/lib/python2.4/site-packages/django/newforms/fields.py", line 453, in clean

raise ValidationError(self.error_messagesempty)

ValidationError: [u'The submitted file is empty.']


Ran 234 tests in 63.274s

FAILED (failures=2)

by Faheem Mitha, 16 years ago

Slightly modified version of streaming.7092.patch with more tests passing.

comment:225 by Faheem Mitha, 16 years ago

With the patch I just uploaded (basically the changes are liberal use of 'content-length' as suggested by Oyvind, the errors are now reduced
to
======================================================================
FAIL: Doctest: modeltests.model_forms.models.test.API_TESTS


Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest

raise self.failureException(self.format_failure(new.getvalue()))

AssertionError: Failed doctest test for modeltests.model_forms.models.test.API_TESTS

File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line unknown line number, in API_TESTS


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

type(f.cleaned_dataimage)

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[226]>", line 1, in ?

type(f.cleaned_dataimage)

AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[227]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The ImageFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[228]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.cleaned_dataimage

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[231]>", line 1, in ?

f.cleaned_dataimage

AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[232]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[233]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

os.unlink(instance.get_image_filename())

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[234]>", line 1, in ?

os.unlink(instance.get_image_filename())

AttributeError: 'TextFile' object has no attribute 'get_image_filename'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[237]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[238]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.delete()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[239]>", line 1, in ?

instance.delete()

File "/usr/local/lib/python2.4/site-packages/django/db/models/base.py", line 324, in delete

assert self._get_pk_val() is not None, "%s object can't be deleted because its %s attribute is set to None." % (self._meta.object_name, self._meta.pk.attname)

AssertionError: TextFile object can't be deleted because its id attribute is set to None.


Ran 234 tests in 63.014s

FAILED (failures=1)

comment:226 by anonymous, 16 years ago

content-length is in the wrong dict in ImageField tests and probably some others too.

>>> f = ImageFileForm(data={'description': u'An image'}, files={'image': {'filename': 'test.png', 'content': image_data}, 'content-length':len(image_data)})

should be

>>> f = ImageFileForm(data={'description': u'An image'}, files={'image': {'filename': 'test.png', 'content': image_data, 'content-length':len(image_data)}})

by egon, 16 years ago

Attachment: streaming.7106.patch added

Passes al tests.

comment:227 by anonymous, 16 years ago

Cc: drfarina@… added

comment:228 by beau, 16 years ago

Cc: beau@… added

comment:229 by anonymous, 16 years ago

Cc: giuliani.v@… added

comment:230 by fdr, 16 years ago

I've been rewriting the multipart parsing to be lazier and found something that seems to me like a DDOS exploit in this code.

Consider line 231-233 in streaming.7106.patch. What happens here is:

  1. A multipart boundary has been seen recently, so we are in the HEADER state
  2. We are looking to match the end of of the header
  3. If there's not enough data we simply break and add more bytes to our unmatched bytes from the network and try matching again later

Is there any reason why a "bad" client could not send a multipart request that *never* properly sends the header termination bytes and thus forces the multipart parser to load as much of the stream as possible (up to Content-Length bytes) into memory? As the amount of bytes increases towards the maximum Content-Length the call to .find() itself will become incredibly expensive. A few malformed requests might also cause the server to swap into oblivion or run out of address space on 32 bit systems.

comment:231 by fdr, 16 years ago

Bad wording in the prior comment.

First sentence should be read as follows:

This patch appears to introduce a DOS attack vector into Django.

comment:232 by egon, 16 years ago

I've tested this with:

  • Lighttpd 1.4 with fastcgi/scgi: doesn't work. As somebody else said above it looks like it isn't streaming the upload at all, or maybe lighttpd waits for the POST to be over before streaming it to the fcgi app.
  • Apache 2.2 with scgi: works fine so far.

comment:233 by avain <avainitri@…>, 16 years ago

Component: Core frameworkHTTP handling
Has patch: unset
Needs tests: unset
Patch needs improvement: set

I've tested streaming.7106.patch. It works for the upload file with 860MB. However I tested it with 1.5G. It fails. The message is below.

Request Method:  	POST
Request URL: 	http://192.168.1.4/imagecity/upload/
Exception Type: 	MemoryError
Exception Value: 	out of memory
Exception Location: 	/usr/lib/python2.5/site-packages/django/http/multipartparser.py in _read, line 285
/usr/lib/python2.5/site-packages/django/http/multipartparser.py in _read
 285. self._file.write(data[start:end]) ...

comment:234 by Roman Tolkachyov <rt@…>, 16 years ago

save_FOO_file() works wrong with streaming.7106.patch
If raises

Traceback (most recent call last):
  ...
  File "/usr/lib/python2.5/site-packages/django/db/models/fields/__init__.py", line 783, in <lambda>
    setattr(cls, 'save_%s_file' % self.name, lambda instance, filename, raw_field, save=True: instance._save_FIELD_file(self, filename, raw_field, save))
  File "/usr/lib/python2.5/site-packages/django/db/models/base.py", line 414, in _save_FIELD_file
    if raw_field.has_key('tmpfilename'):
AttributeError: 'str' object has no attribute 'has_key'

comment:235 by fdr, 16 years ago

I worked on a lazier (i.e. you can read from the stream without loading the entire thing onto disk) and hopefully better patch for this, in case anyone is getting antsy. I haven't been able to return to it for a week or two, but all the annoying parts are pretty much done.

It does, however, gut the current implementation. At that time I also tested mod_wsgi2's buffering behavior and found that it does not do aggressive buffering (a good thing), so sites running behind mod_wsgi2 can do flow control on clients. This was done by augmenting the current patch with .sleep() calls between blocks and then posting a large file.

This is particularly important if you have to send the material to another server and would prefer (or require) that it not be held on disk in full, if the stream is of indeterminate length, or if you wanted to hold the request in full to begin with (for example: filtering a dataset as it goes by and populating the database with a structured representation)

comment:236 by Michael Axiak, 16 years ago

Status: newassigned

Alright I'm going to be working on this ticket this week. Before I do any work, I'm uploading the current diff updated to work against HEAD largely untouched. After reviewing the comments, before we can consider this ready for checking or close to that, I think that the following is needed:

  1. Updating this to work with the new-ish FileField backends.
  2. Pull the MultiPartParser logic apart from the actual dealing with the chunked data. (I.e. a user should be able to do whatever they want with the data.)
  3. Make the MultiPartParser lazier (i.e. not look for boundaries, but let them come when available).
  4. Go through the bugs that people have reported (thanks for doing so!) and ensuring that they no longer exist.

-Mike

by Michael Axiak, 16 years ago

Attachment: ticket2070_rev7277.diff added

Same patch applied to @7277

comment:237 by Malcolm Tredinnick, 16 years ago

I think the general design also needs a review. Things like a get parameter that is acted upon before it gets to the view isn't really acceptable, since it means you can never use that parameter name in normal code. So can you please concentrate on a patch for solving the initial problem (large file uploads) and worry about stuff like upload progress meters after some design discussion on django-dev as a later thing. The last few patches seem to have been heading into feature-growth territory and not solving the legitimate problem that was the original bug report.

comment:238 by Michael Axiak, 16 years ago

Has patch: set
Needs documentation: set
Needs tests: set

Alright, this has been a long time waiting, but the ticket has been refactored to what I think has been a fairly reasonable state. It might undergo a few movements of code, but I think the overall design is here to stay. To get a grasp of the API this ticket enables, please visit the thread on this matter. [1] A great thanks goes to fdr--- he and I have been putting the *sprint* in this past Django sprint during the late hours (our hemisphere).

Please feel free to test this by running it on a platform and giving us the results!

An overview of the status of this ticket:

  • It has basic functionality. It fails fast...we'll probably silence a couple of the errors before we go to trunk, but it's good to have noise while people are debugging.
  • There is one test. The code for this test can be used to extend to more harsher tests (wrong content lengths, etc) to see how the parser handles.
  • There are no doc changes yet. I'll have basic ones by the end of the week (how to change the default temporary file location, etc), and the docs for describing the API...I'm not sure :-)
  • A few design things that could be resolved:
    • Supporting multiple upload handlers (right now it supports 1 in a list that could be extended, but we haven't how it deals with ordering).
    • Adding signals [doesn't have to be in this ticket, I think]
    • We send the charset to the handler, but we could take care of encoding for it if the mime starts with 'text/'...people might complain about getting unicode objects (like a CSV for example).
  • I separated the diffs to the one that deals with the uploading, and the one that deals with the saving of the uploaded data.
  • The newforms interface is completely untested as of now.

by Michael Axiak, 16 years ago

NEW Upload handling for revision 7339

by Michael Axiak, 16 years ago

NEW Form handling for uploaded files for revision 7339

comment:239 by fdr, 16 years ago

I'll probably be furnishing the enhanced docs before too long. One important debugging feature that isn't in yet is a way to detect if the parser starts to undergo an infinite loop, which can happen when LazyStream.read() and LazyStream.unget() are called in a way where the same bytes constantly get read out and pushed back in succession. It'd be much better to crash quickly and log well in this case so bugs can get reported.

by Michael Axiak, 16 years ago

Latest patch for 2070...includes everything (see comment)

comment:240 by Michael Axiak, 16 years ago

I just added a new diff, 2070_revision7351_latest.diff. Against my earlier ways, I've decided to just merge the two patch changes, since you really can't use one and not both. The new patch features the ability to work with multiple upload handlers, updating forms to work better, and a few bugfixes. The patch should now pass all tests.

What YOU can do to help:

  1. Download the patch.
  2. Run patch -p0 < 2070_revision7351_latest.diff on your Django installation.
  3. Run tests (django/tests/runtests.py --settings=...)
  4. Test with a project that involves uploading. (Big and small!)
    • Your memory should stay below 100 MB regardless of how big your upload size is (TODO: measure how big the typical process size is and see if we have to call gc.collect() at some point.)
    • Your temporary directory should contain partial files during the upload is progressing (ls -ltr /tmp on *nix systems).
    • The uploading should work with all of your apps and forms like they have.
  5. Report ANY errors to this ticket.

Things left from this patch:

  1. Documentation of things other than settings.FILE_UPLOAD_TEMP_DIR. (I'm not sure it's a blocker for this ticket...we'd probably have to have another doc. section for this API.)
  2. Some more mean tests could be written (probably have to circumvent the test client for a few).
  3. While the parser shouldn't enter an infinite loop, it can currently happen. There will be some accounting to make sure that this failure mode doesn't exist in a new patch.
  4. Review from the devs.

by Michael Axiak, 16 years ago

Attachment: 2070_latest7354.diff added

The latest #2070 patch. Small changes...see comment.

comment:241 by Michael Axiak, 16 years ago

I just uploaded a new patch 2070_latest7354.diff. This includes minor fixes including:

  • Works/tested on python 2.3.
  • Works/tested on windows.
  • Changes how multiple handlers behave.

Later tonight I'm going to shoot an email to django-developers with a revised API, since the implementation has of course given me more insight into how things should work.

comment:242 by Maniac <Maniac@…>, 16 years ago

Mike, thanks for the new patch! Leaving aside design issues I have some comments on the implementation part. Feel free to disagree with any!

  • You have many multi-line comments decorated with in boxes of #s. But the rest of the code doesn't do this, all comments are usually have just one # on the left of each line. And I personally prefer it in Django way, it's easier to read :-)
  • In MultipartParser.init you seem to half got rid of guessing content_length if they're not in HTTP headers. What was the intent? Isn't headers the only place where it should be and thus you should raise and exception right away if it's not available?
  • In MultipartParser.parse you have a common pattern of invoking a method on all handlers until it returns some non-false value. May be abstract it in a function like call_handlers(method_name, *args, kwargs) that will do the iteration?
  • line 200 of multipartparser.py: if content-transfer-encoding is available but not 'base64' you're feeding non-decoded data to handlers. I believe it's only going to work if encoding is '8bit' and in other cases it's better to raise an exception that encoding is unknown.
  • I couldn't figure out why FakePayload is needed. Can you clarify?
  • In HttpRequest you're taking special care of not allowing setting _upload_handlers in _set_upload_handlers but they can still be changed since you're giving out the whole list and it's mutable. I.e. request.upload_handlers.append(handler) will work regardless. I realize now that you set it to a tuple but it can be overwritten. May be it's better to just have request.add_upload_handler()? What's the point of reading them?
  • In HttpRequest.parse_file_upload you have a comment "Order here is *very* important." but never say how exactly it works. May be a short note like: "Handlers that go sooner may cancel remaining handlers by signaling that handling is complete".
  • line 420 of db.models.base: I think a comment "exclusive lock" before "filelocks.lock(fp, filelocks.LOCK_EX)" just asks to have this call to be wrapped up in a function "exclusive_lock(fp)" so you no longer have to explain it. Looks like it's the only lock you're using anyway.

I didn't look through uploadedfile.py and fileuploadhandler.py, will leave it till later.

in reply to:  242 ; comment:243 by Michael Axiak, 16 years ago

Replying to Maniac <Maniac@SoftwareManiacs.Org>:

Mike, thanks for the new patch! Leaving aside design issues I have some comments on the implementation part. Feel free to disagree with any!

Thank you for your comments! Before I begin, what issues do you have with the design?

Here's a response to each of your comments:

  • You have many multi-line comments decorated with in boxes of #s. But the rest of the code doesn't do this, all comments are usually have just one # on the left of each line. And I personally prefer it in Django way, it's easier to read :-)

This is style. I will update this in a future patch.

  • In MultipartParser.init you seem to half got rid of guessing content_length if they're not in HTTP headers. What was the intent? Isn't headers the only place where it should be and thus you should raise and exception right away if it's not available?

I'm not entirely sure what you mean by this. The current patch will not guess the content-length. Unfortunately, there's no way that I can see around this, since the content length is really necessary. Unfortunately raising an exception will result in a "Connection Reset" error, but I guess it's better than any alternative I can think of.

  • In MultipartParser.parse you have a common pattern of invoking a method on all handlers until it returns some non-false value. May be abstract it in a function like call_handlers(method_name, *args, kwargs) that will do the iteration?

If you pay close attention, you will see that it's not really common. (Especially in the latest patch.) I think the complexity involved to encapsulate each of these loops would just do more to obfuscate the code. You're welcome to prove me wrong, though. :)

  • line 200 of multipartparser.py: if content-transfer-encoding is available but not 'base64' you're feeding non-decoded data to handlers. I believe it's only going to work if encoding is '8bit' and in other cases it's better to raise an exception that encoding is unknown.

No. My interpretation of the RFCs is that we can accept any one of '8bit', '7bit', or 'binary' without doing anything. These 3 encodings are really telling us how we can interpret them (7bit being just ASCII, 8bit and binary being 8bit character sets and binary files) and don't actually need any decoding before being written to a file. My perusal of the email.message python module seems to agree with this assessment, and it also shows how to support a few other encodings I might wish to add support for.

  • I couldn't figure out why FakePayload is needed. Can you clarify?

I thought the docstring said it. It restricts what can be acted on in a test client request. The point is that (a) a StringIO object supports too many features and (b) we should complain when we try to read more than the available content, since this will result in poor performance when the file is a wrapper around a blocking network socket. Again, these are only for tests, as I feel just using StringIO doesn't nearly exhibit the same behavior as the network socket.

  • In HttpRequest you're taking special care of not allowing setting _upload_handlers in _set_upload_handlers but they can still be changed since you're giving out the whole list and it's mutable. I.e. request.upload_handlers.append(handler) will work regardless. I realize now that you set it to a tuple but it can be overwritten. May be it's better to just have request.add_upload_handler()? What's the point of reading them?

It can only override it by using the "non-public" API of _upload_handlers. By using the public API, you're ensuring yourself the safety of knowing that the upload handlers are being modified with expected behavior. I thought about providing a .add_upload_handler() but then I wanted a .prepend_upload_handler() and I wanted a .insert_upload_handler() and realized that we really have a list, and the ways to modify a list is well documented. There's no reason why we can't expect someone to want to read the upload handlers to decide where he wants to place something. (E.g.: My fictional AJAXUploadHandler Middleware will always place the AJAXUploadHandler after the InMemoryUploadHandler.)

  • In HttpRequest.parse_file_upload you have a comment "Order here is *very* important." but never say how exactly it works. May be a short note like: "Handlers that go sooner may cancel remaining handlers by signaling that handling is complete".

Good point. I'll have this in a future patch.

  • line 420 of db.models.base: I think a comment "exclusive lock" before "filelocks.lock(fp, filelocks.LOCK_EX)" just asks to have this call to be wrapped up in a function "exclusive_lock(fp)" so you no longer have to explain it. Looks like it's the only lock you're using anyway.

This is really a stylistic thing. The thing is there are a lot of ways to lock a file, so I just want to use a clean syntax that supports all those ways. I don't like the idea of adding a function wrapper for each action when a comment can make it just as clear. (And really, filelocks.lock(fp, filelocks.LOCK_EX) is probably clear enough for anyone who knows what an exclusive lock is to not even need that comment.)

I didn't look through uploadedfile.py and fileuploadhandler.py, will leave it till later.

Thanks again for your comments. Look forward to hearing more from you!

in reply to:  243 ; comment:244 by Ivan Sagalaev <Maniac@…>, 16 years ago

Thank you for your comments! Before I begin, what issues do you have with the design?

None, as of now :-)

  • In MultipartParser.init you seem to half got rid of guessing content_length if they're not in HTTP headers. What was the intent? Isn't headers the only place where it should be and thus you should raise and exception right away if it's not available?

I'm not entirely sure what you mean by this. The current patch will not guess the content-length. Unfortunately, there's no way that I can see around this, since the content length is really necessary. Unfortunately raising an exception will result in a "Connection Reset" error, but I guess it's better than any alternative I can think of.

By "half got rid" I mean this code in init:

try:
    content_length = int(META.get('HTTP_CONTENT_LENGTH',
                                  META.get('CONTENT_LENGTH',0)))
except (ValueError, TypeError):
    # For now set it to 0...we'll try again later on down.
    content_length = 0

# If we have better knowledge of how much
# data is remaining in the request stream,
# we should use that. (modpython for instance)
#try:
#    remaining = input_data.remaining
#    if remaining is not None and \
#            (content_length is None or remaining < content_length):
#        content_length = remaining
#except AttributeError:
#    pass

if not content_length:
    # This means we shouldn't continue...raise an error.
    raise MultiPartParserError("Invalid content length: %r" % content_length)

I think everything after first "except" can be safely deleted and in first "except" should be "raise MultiPartParserError(...)" instead of setting content_length to 0.

If you pay close attention, you will see that it's not really common. (Especially in the latest patch.) I think the complexity involved to encapsulate each of these loops would just do more to obfuscate the code. You're welcome to prove me wrong, though. :)

Oh... Now I see that read_data_chunk makes things differently requiring that non-None value to continue. It's not worth abstracting then.

A small nit... What actually made me think about this is that you have "handlers = self._upload_handlers" at the top of "parse" and I tried to figure out if it's really needed to shave 14 letters in a couple of cases at the expense of requiring a reader to make one more translations "handlers are those same handlers that lay in an object".

No. My interpretation of the RFCs is that we can accept any one of '8bit', '7bit', or 'binary' without doing anything. These 3 encodings are really telling us how we can interpret them (7bit being just ASCII, 8bit and binary being 8bit character sets and binary files) and don't actually need any decoding before being written to a file. My perusal of the email.message python module seems to agree with this assessment, and it also shows how to support a few other encodings I might wish to add support for.

What I'm worried about is 'quoted-printable'. Now it will go to the handler unchanged which is wrong. Updating my comment to your reply I think the logic should be like this:

if transfer_encoding == 'base64':
    # decode
elif transfer_encoding in ['8bit', '7bit', 'binary']:
    # just feed
else:
    raise ...

BTW, browsers don't use quoted-printable here but some hand-written user-agent could...

  • I couldn't figure out why FakePayload is needed. Can you clarify?

I thought the docstring said it. It restricts what can be acted on in a test client request. The point is that (a) a StringIO object supports too many features and (b) we should complain when we try to read more than the available content, since this will result in poor performance when the file is a wrapper around a blocking network socket. Again, these are only for tests, as I feel just using StringIO doesn't nearly exhibit the same behavior as the network socket.

I did understand what it does but didn't understand why it's needed. Using it for tests makes sense now, thanks!

It can only override it by using the "non-public" API of _upload_handlers. By using the public API, you're ensuring yourself the safety of knowing that the upload handlers are being modified with expected behavior. I thought about providing a .add_upload_handler() but then I wanted a .prepend_upload_handler() and I wanted a .insert_upload_handler() and realized that we really have a list, and the ways to modify a list is well documented. There's no reason why we can't expect someone to want to read the upload handlers to decide where he wants to place something. (E.g.: My fictional AJAXUploadHandler Middleware will always place the AJAXUploadHandler after the InMemoryUploadHandler.)

But each custom handler will know where to put itself only relative to standard handlers that it knows about... Middleware machinery just leaves this to a user to figure out. In this case I believe everyone can also place middleware, decorators etc in the right order without parsing this list in code.

(And really, filelocks.lock(fp, filelocks.LOCK_EX) is probably clear enough for anyone who knows what an exclusive lock is to not even need that comment.)

Indeed :-). That was my second option too :-)

in reply to:  244 comment:245 by Michael Axiak, 16 years ago

Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:
Alright, we're very close here!

  • By "half got rid" I mean this code in init ...

Yeah, actually the new patch that I have queued up changes this behavior a little bit. It's more in line with this thinking.

  • A small nit... What actually made me think about this is that you have "handlers = self._upload_handlers" at the top of "parse" and I tried to figure out if it's really needed to shave 14 letters in a couple of cases at the expense of requiring a reader to make one more translations "handlers are those same handlers that lay in an object".

I actually didn't do it to save 14 characters. I did it because this loop can potentially run millions of iterations and I didn't want to do a needless self.var lookup when var is slightly faster. By this point the loop is so complicated that I might as well change it, but still...CPU is CPU.

  • What I'm worried about is 'quoted-printable'. Now it will go to the handler unchanged which is wrong. Updating my comment to your reply I think the logic should be like this:

My hope was that someone could write a handler to deal with quoted-printable :).

  • But each custom handler will know where to put itself only relative to standard handlers that it knows about... Middleware machinery just leaves this to a user to figure out. In this case I believe everyone can also place middleware, decorators etc in the right order without parsing this list in code.

Yeah I understand the issue. I'm not sure of the best interface. My thinking was that a list is so well-known that people won't be surprised to learn to do: request.upload_handlers.append(upload_handler). I'm open to suggestions that will support the different operations.

Thanks again for your review. I'm writing a long-ish email to Django developers outlining a few of the issues and a revised API description.

comment:246 by Michael Axiak, 16 years ago

Needs documentation: unset

Okay, I'm posting a new diff. Not many changes. A few stylistic changes, a better handling of multiple handlers, and the new documentation for the upload handlers.

by Michael Axiak, 16 years ago

Attachment: 2070_revision7359.diff added

New diff...some style changes and new documentation.

comment:247 by Ivan Sagalaev <Maniac@…>, 16 years ago

Mike, here are comments & question about fileuploadhandler.py and uploadedfile.py. Apparently, only one question :-)

MemoryFileUploadHandler implements handle_raw_input and, thus, has to duplicate the logic of the multipart parser. Why not implement this handler just like TemporaryFileUploadHandler but using StringIO?

in reply to:  247 comment:248 by Michael Axiak, 16 years ago

Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:

Mike, here are comments & question about fileuploadhandler.py and uploadedfile.py. Apparently, only one question :-)

MemoryFileUploadHandler implements handle_raw_input and, thus, has to duplicate the logic of the multipart parser. Why not implement this handler just like TemporaryFileUploadHandler but using StringIO?

Yes, I originally used StringIO and did it the same way, but I realized that since it's all in memory, I can use the proven email parser. I thought it would be really slick if it's so flexible we can even revert back to our old ways of parsing for when we know we can put it all in memory.

That being said, we can change it back if you think it's unnecessary.

comment:249 by Ivan Sagalaev <Maniac@…>, 16 years ago

Yes, I think it is unnecessary. I believe we're going to debug the streaming parser to reliable state anyway, aren't we? :-) And after this we'll just have two different parsers for no purpose...

in reply to:  249 ; comment:250 by Michael Axiak, 16 years ago

Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:

Yes, I think it is unnecessary. I believe we're going to debug the streaming parser to reliable state anyway, aren't we? :-) And after this we'll just have two different parsers for no purpose...

Okay. That being said, let me outline what my new patch will have, before I finish it:

  1. A settings variable for when to use memory and when to use streaming (settings.FILE_UPLOAD_MAX_MEMORY_SIZE).
  2. A settings variable for the default upload handlers. (settings.FILE_UPLOAD_HANDLERS)
  3. One single parser.
  4. Forms support for dictionary (but emitting deprecation warnings).
  5. Updating forms docs to use the new File notation instead of a dictionary.

Let me know if there's anything to add to this list!

in reply to:  250 ; comment:251 by Maniac <Maniac@…>, 16 years ago

  1. A settings variable for when to use memory and when to use streaming (settings.FILE_UPLOAD_MAX_MEMORY_SIZE).

BTW when I was thinking about such thing in the past I had an idea that we can use the size of the buffer that we use to read data from input_stream to be this maximum value. I.e. if we can afford some amount of memory per file then there's no point of making buffer size smaller.

  1. A settings variable for the default upload handlers. (settings.FILE_UPLOAD_HANDLERS)
  2. One single parser.
  3. Forms support for dictionary (but emitting deprecation warnings).
  4. Updating forms docs to use the new File notation instead of a dictionary.

This is great! Looking forward to it.

in reply to:  251 ; comment:252 by Michael Axiak, 16 years ago

BTW when I was thinking about such thing in the past I had an idea that we can use the size of the buffer that we use to read data from input_stream to be this maximum value. I.e. if we can afford some amount of memory per file then there's no point of making buffer size smaller.

I had a similar idea, however this is entirely dependent on the garbage collection, no? That is, if we called gc.collect() on every iteration of the chunks, I think we'd be set.

-Mike

in reply to:  252 ; comment:253 by Maniac <Maniac@…>, 16 years ago

I had a similar idea, however this is entirely dependent on the garbage collection, no? That is, if we called gc.collect() on every iteration of the chunks, I think we'd be set.

Uhm... Looks like I wasn't clear enough, sorry... I just mean that if we have a setting FILE_UPLOAD_MAX_MEMORY_SIZE then we can use this value for chunk_size in FileUploadHandler. It has nothing to do with GC.

in reply to:  253 comment:254 by Michael Axiak, 16 years ago

Replying to Maniac <Maniac@SoftwareManiacs.Org>:

Uhm... Looks like I wasn't clear enough, sorry... I just mean that if we have a setting FILE_UPLOAD_MAX_MEMORY_SIZE then we can use this value for chunk_size in FileUploadHandler. It has nothing to do with GC.

I don't think so. These are two different parameters. As an example, suppose I set FILE_UPLOAD_MAX_MEMORY_SIZE to 2.5 MB. If this were the chunk size, then that would mean each iteration would create a value which held 2.5MB worth of data. If the GC always collected OR if the GC collected based on *size*, then it would mean that the upload shouldn't take appreciably more than 2.5MB total. However, if the GC collects based on number of allocations (which I believe is the default in standard CPython), then having a chunk size of 2.5MB would take appreciably more memory than having a lower chunk size; and certainly much more memory than an entire upload going into memory at once.

At least, this is my understanding of the garbage collection in python. If you look at your memory usage during an upload, you will see interesting memory usage.

comment:255 by Maniac <Maniac@…>, 16 years ago

Ah... Now I understand the relation with GC. But you seem to have explored this question much more than I even tried so I refrain with my idea.

comment:256 by Michael Axiak, 16 years ago

Okay! This patch includes all of the aforementioned improvements. It still passes all the tests. I've tested uploads on the following platforms:

  • Python 2.5/2.3
  • Apache mod_python, mod_fastcgi, mod_wsgi
  • Lighttpd mod_fastcgi
  • Development server

They all stream out of the box.

comment:257 by Michael Axiak, 16 years ago

Needs tests: unset
Patch needs improvement: unset

I like the patch as is. I think now is a good time to start testing and seeing if this is what people want.

by Michael Axiak, 16 years ago

Attachment: 2070_revision7363.diff added

Altered documentation to be more approachable.

by Michael Axiak, 16 years ago

Attachment: 2070_revision7363.2.diff added

Slightly newer...handles exhaustion a little bit differently.

comment:258 by Eric B <ebartels@…>, 16 years ago

There's a problem with the image validation in django.newforms.fields.ImageField (line 507)

If file is a StringIO object, the second call to Image.open will fail with an IOError unless there's a file.reset() call first, something like:

try:
    # load() is the only method that can spot a truncated JPEG,
    #  but it cannot be called sanely after verify()
    trial_image = Image.open(file)
    trial_image.load()
    # verify() is the only method that can spot a corrupt PNG,
    #  but it must be called immediately after the constructor
    if hasattr(file, 'reset'):
        file.reset()
    trial_image = Image.open(file)
    trial_image.verify()
except Exception: # Python Imaging Library doesn't recognize it as an image
    raise ValidationError(self.error_messages['invalid_image'])

in reply to:  258 ; comment:259 by Michael Axiak, 16 years ago

Replying to Eric B <ebartels@gmail.com>:

There's a problem with the image validation in django.newforms.fields.ImageField (line 507)
[...]

Hi Eric, thanks so much for the information. I just have a comment and a question.

Question: If this is a problem, how did this patch change anything? The current implementation just does StringIO(f.content). I was intending to change as little functionality as possible, and it passes the tests (which I believe include image validity checking in newforms)...

Comment: There's no reason to have the hasattr()}} check there, because there's an if statement 5 lines above that will know when its a {{{StringIO object and when it isn't.

Again, thanks for your feedback!

in reply to:  259 comment:260 by Michael Axiak, 16 years ago

Replying to axiak:

Replying to Eric B <ebartels@gmail.com>:

There's a problem with the image validation in django.newforms.fields.ImageField (line 507)
[...]

Hi Eric, thanks so much for the information. I just have a comment and a question.

Question: If this is a problem, how did this patch change anything? The current implementation just does StringIO(f.content). I was intending to change as little functionality as possible, and it passes the tests (which I believe include image validity checking in newforms)...

Doh. I see it now. Sorry about that. :-)

I have a few other changes queued up, so I'll include that in the next patch.

in reply to:  259 comment:261 by Michael Axiak, 16 years ago

Replying to axiak:

and it passes the tests (which I believe include image validity checking in newforms)...

As it turns out, the current tests have a model called ImageFile which actually doesn't test the image field. Is there any reason we cannot do this for our tests?

class ImageFile(models.Model):
    description = models.CharField(max_length=20)
    try:
        # If PIL is available, try testing PIL.
        # Otherwise, it's equivalent to TextFile above.
        import Image
        image = models.ImageField(upload_to=tempfile.gettempdir())
    except ImportError:
        image = models.FileField(upload_to=tempfile.gettempdir())

    def __unicode__(self):
        return self.description

After doing that, the test fails properly without Eric's suggested modification. (When PIL is installed.)

Cheers,
Mike

comment:262 by Andrii Kurinnyi, 16 years ago

Cc: prigun@… added

by Michael Axiak, 16 years ago

Attachment: 2070_revision7377.diff added

New 2070 patch...includes fixes to the parser in addition to the newforms fix caught by EricB as well as better docs.

comment:263 by Michael Axiak, 16 years ago

The new patch includes a couple improvements over the last. No change in design, just small buxfixes:

  1. The low-level API parser now correctly keeps track of where it is (LazyStream). (The default django api never used this feature.)
  2. The newforms problem with Image validation is fixed, along with modifying the tests to test that.
  3. The docs were reworded a little bit.

If it makes reviewing any easier, I'm thinking of coming up with a patch series that will represent a good flow of changes to follow...that's next on my list. If you are thinking about reviewing, here's the path I'd take:

  1. Read the high-level docs. http://orodruin.axiak.net/upload_handling.html Make sure you're fine with the design.
  2. Read the diff of stuff in /tests. Make sure you're fine with the test modifications.
  3. At this point it's probably best to start with the HTTP request and work your way in.

Thanks to everyone who's trying this patch out!

Cheers,
Mike

comment:264 by bbrriiccee@…, 16 years ago

This patch correct the bug for the ImageField. The problem come from StringIO(data.read()) that's become empty after the 1st read. By using a var to store str from data.read(), you can call 2 different StringIO on the constructor.

--- fields.py   2008-03-31 15:25:36.000000000 -0700
+++ fields.py   2008-03-31 17:04:23.000000000 -0700
@@ -494,17 +494,21 @@
             except ImportError:
                 from StringIO import StringIO
             if hasattr(data, 'read'):
-                file = StringIO(data.read())
+                data_buffer = data.read()
+                file_str_io_1 = StringIO(data_buffer)
+                file_str_io_2 = StringIO(data_buffer)
             else:
-                file = StringIO(data['content'])
+                data_buffer = data['content']
+                file_str_io_1 = StringIO(data_buffer)
+                file_str_io_2 = StringIO(data_buffer)
         try:
             # load() is the only method that can spot a truncated JPEG,
             #  but it cannot be called sanely after verify()
-            trial_image = Image.open(file)
+            trial_image = Image.open(file_str_io_1)
             trial_image.load()
             # verify() is the only method that can spot a corrupt PNG,
             #  but it must be called immediately after the constructor
-            trial_image = Image.open(file)
+            trial_image = Image.open(file_str_io_2)
             trial_image.verify()
         except Exception: # Python Imaging Library doesn't recognize it as an image
             raise ValidationError(self.error_messages['invalid_image'])

in reply to:  264 ; comment:265 by Michael Axiak, 16 years ago

Replying to bbrriiccee@gmail.com:

This patch correct the bug for the ImageField. The problem come from StringIO(data.read()) that's become empty after the 1st read. By using a var to store str from data.read(), you can call 2 different StringIO on the constructor.
...

Huh. I thought I had included the fix in the latest patch...I'll review and commit the patch with the latest code.

By the way, your patch has two inefficiencies:

  1. You shouldn't have to call .read() on the file object if it has a temporary file name. If it's a large file, we should let the OS do the seeking which is what happens when you give the filename to PIL.
  2. There's no reason to have two distinct StringIO objects when you can just call .reset().

I'll have the fixed patch up in a minute or two.

by Michael Axiak, 16 years ago

Attachment: 2070_revision7396.diff added

Better patch...not sure how the newforms code in the previous patch was the way it was.

in reply to:  265 comment:266 by Michael Axiak, 16 years ago

Replying to axiak:

[...]
I'll have the fixed patch up in a minute or two.

Patch uploaded. Again, I'm not certain how the newforms changes didn't get in. But I do know that I was really tired when I uploaded the last version. I gave this new patch a once-over at least...

comment:267 by Eric B <ebartels@…>, 16 years ago

Mike, I beleive the last patch introduced an error in fileuploadhandler.load_handler. We're now passing request to load_handler as a separate parameter and wrapping it with RestrictedRequest. On line 223 of fileuploadhandler, request needs to be passed separately to the FileUploadHandler constructor also. Otherwise, the handler won't have access to it.

return cls(*args, **kwargs)

needs to be:

return cls(request, *args, **kwargs)

With that change, everything seems to be working perfectly. Thanks.

comment:268 by Michael Axiak, 16 years ago

Hey Eric,
Thanks for keeping me honest :-). I will upload the patch as soon as I finish this comment.

Another question: Are you doing anything interesting with this patch? Just curious, since you wouldn't have hit this bug with either of the built-in upload handlers.

Cheers,
Mike

by Michael Axiak, 16 years ago

Attachment: 2070_revision7388.diff added

new patch that fixes error with instantiating the upload handlers, thanks Eric!

in reply to:  268 ; comment:269 by Eric B <ebartels@…>, 16 years ago

Replying to axiak:

Another question: Are you doing anything interesting with this patch? Just curious, since you wouldn't have hit this bug with either of the built-in upload handlers.

Since this patch makes it so easy, I decided to throw together an upload handler to display progress feedback. I know you had a patch in #4165, so you've probably got something similar in the works. I posted what I came up with to djangosnippets: http://www.djangosnippets.org/snippets/678/

in reply to:  269 comment:270 by Michael Axiak, 16 years ago

Replying to Eric B <ebartels@gmail.com>:

Since this patch makes it so easy, I decided to throw together an upload handler to display progress feedback. I know you had a patch in #4165, so you've probably got something similar in the works. I posted what I came up with to djangosnippets: http://www.djangosnippets.org/snippets/678/

Awesome job. Do you want to contribute to django-uploadutils? I've been to busy with other things at the moment so haven't had time to write stuff like what you just wrote. It's very easy for me to add you as a contributor...

comment:271 by Jacob, 16 years ago

My feedback on the patch (2070_revision7388.diff):

  1. Looks like some unrelated changes might have creapt in; this shouldn't be in the patch:
  • django/conf/global_settings.py

    a b  
    114114SEND_BROKEN_LINK_EMAILS = False
    115115
    116116# Database connection info.
    117 DATABASE_ENGINE = ''           # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
     117DATABASE_ENGINE = ''           # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'ado_mssql'.
    118118DATABASE_NAME = ''             # Or path to database file if using sqlite3.
    119119DATABASE_USER = ''             # Not used with sqlite3.
    120120DATABASE_PASSWORD = ''         # Not used with sqlite3.
  1. I have reservations about FILE_UPLOAD_HANDLERS; let's discuss this in the thread on django-developers.
  1. Please provide a link to the Python Cookbook recipe from which you took the file lock implementation. You also need to note that the file lock code is licensed under the Python Software License (as is all the cookbook code).
  1. Please double-check your code for PEP8/Django coding style. I see a number of places where you differ in style -- espcially in docstring usage, a few classes named things like FILE, FIELD, etc., and exception raising (use raise E(msg) istead of raise E, msg).
  1. Naming nitpick: django/core/files/fileuploadhandler.py -- "file" is redundant in there; you can name it (and other bits) things like django/core/files/uploadhandler.py.
  1. In general you've gone with the Java-ish style of one class per file. That's OK, and in this case I think it works relatively well. However, it can make flow hard to follow -- if a class/function is only used in a single place, you really should define it in the same place it's used. No "action item" here, just a note.
  1. Can you explain why you think you need RestrictedRequest? Can't we just insist that people write upload handlers correctly?
  1. Can you explain the save_FIELD_file/move_FIELD_file distinction to me? I don't see why you need both methods. While you're at it, can you explain the decision to change the method interface? That'll be a backwards-incompatible change, so let's not change it unless we have to.
  1. Nice job on the RFC2388 parser -- very clearly written and easy to read.
  1. Another style point: warnings code should look like this:
if something_is_wrong:
    import warnings
    warnings.warn(...)

(That is, instead of importing warnings at the module level). IIRC there's a minor overhead associated with initializing the warnings system the first time warnings is imported; our "house style" is thus to put the import in the guarded block. It also makes the future replacement of the warning with an error an easier change that doesn't leave behind an undeeded import.

  1. Similarly, in deprecating dict usage for uploaded files, keep all the backwards-compat code in the if clause. You've got this
  • django/newforms/fields.py

    a b  
    444445            return None
    445446        elif not data and initial:
    446447            return initial
     448
     449        if isinstance(data, dict):
     450            # We warn once, then support both ways below.
     451            warnings.warn("The dictionary usage for files is deprecated. Use the new object interface instead.", DeprecationWarning)
     452
    447453        try:
    448             f = UploadedFile(data['filename'], data['content'])
    449         except TypeError:
     454            file_name = data.file_name
     455            file_size = data.file_size
     456        except AttributeError:
     457            try:
     458                file_name = data.get('filename')
     459                file_size = bool(data['content'])
     460            except (AttributeError, KeyError):
     461                raise ValidationError(self.error_messages['invalid'])
     462
     463        if not file_name:

You shouldn't have both the if and the try/except; the if clause should convert the data into an uploaded file. Again, this makes removing the legacy support easier in the future.

  1. Another patch mixup: you've added an unescape_entities to django.utils.text. That's not part of this patch, is it?
  1. Again:
  • docs/db-api.txt

    a b  
    13061306Using raw strings (e.g., ``r'foo'`` instead of ``'foo'``) for passing in the
    13071307regular expression syntax is recommended.
    13081308
     1309Regular expression matching is not supported on the ``ado_mssql`` backend.
     1310It will raise a ``NotImplementedError`` at runtime.
     1311

Not part of file uploads.

  1. Don't comment out code; just remove it:
  • tests/regressiontests/bug639/tests.py

    a b  
    99from regressiontests.bug639.models import Photo
    1010from django.http import QueryDict
    1111from django.utils.datastructures import MultiValueDict
     12from django.core.files.uploadedfile import SimpleUploadedFile
    1213
    1314class Bug639Test(unittest.TestCase):
    1415       
     
    2122       
    2223        # Fake a request query dict with the file
    2324        qd = QueryDict("title=Testing&image=", mutable=True)
    24         qd["image_file"] = {
    25             "filename" : "test.jpg",
    26             "content-type" : "image/jpeg",
    27             "content" : img
    28         }
    29        
     25        #qd["image_file"] = {
     26        #    "filename" : "test.jpg",
     27        #    "content-type" : "image/jpeg",
     28        #    "content" : img
     29        #    }
     30        qd["image_file"] = SimpleUploadedFile('test.jpg', img, 'image/jpeg')
     31
    3032        manip = Photo.AddManipulator()
    3133        manip.do_html2python(qd)
    3234        p = manip.save(qd)

Overall, very well done!

comment:272 by fdr, 16 years ago

w.r.t. 4:

FILE, FIELD, et al are intended to just act as symbols, and thus I named them similarly to constants. Would you prefer a more regular-class names like File, Field, etc?

My rationale for the naming was:

  1. Their function is similar to that of a constant
  2. It looks more informative in a stack trace than "0" or "1" as you would see with something like "FILE = 0" declared in the module
  3. It's less ambiguous than a string

All in all: I can't believe you actually went through that monster patch in such fine detail. I thought that serious reviews would happen until axiak or I got around to breaking it up.

comment:273 by Camille Harang, 16 years ago

Hello,

after parsing the head of a file that is bieng uploaded (server-side parsing during the first receive_data_chunk), i would like to abort the transfert if the file is not acceptable. To do so i tried to raise a SkipFile or a StopUpload but i has no effect until the file is fully uploaded. Did i misunderstood the goal of these exceptions, or do i use it the wrong way ? Is it possible to something similar and retrieve a message from my custom FileUploadHandler in the view ?

Anyway, this patch is very nice and flexible, thank you for this :-)

Camille.

comment:274 by fdr, 16 years ago

SkipFile will continue to eat the remainder of stream, although it won't deliver it to you for handling. StopFile should stop the transfer cold and cause the client to see a connection reset error. Actually, I'm surprised that StopFile isn't causing some other issues...

--- a/django/http/multipartparser.py
+++ b/django/http/multipartparser.py
@@ -230,6 +230,8 @@ class MultiPartParser(object):
                         exhaust(stream)
                     elif isinstance(e, StopUpload):
                         # Abort the parsing and break
+                        # This doesn't look right, where do we define abort?
+                        # shouldn't this crash?
                         parser.abort()
                         break
                 else:

I'll have a more thorough look before too long, but I have some stuff going on. This chunk of the code is in Mike's expertise, so he will probably chime in first. It looks like some broken refactoring or something here....my quick grep of revision logs doesn't seem to suggest there ever was anything like a "def abort"....actually, mentions of "abort" are scant in general, with the exception of this line. Seems like something got forgotten...maybe it's time for a regression test.

Thanks for the report...I think we can have this fixed quickly, looks like an oversight that will require no particular design change.

in reply to:  274 comment:275 by Michael Axiak, 16 years ago

Replying to fdr:

SkipFile should eat the remainder of the stream until the next file, but StopUpload (as intended) should eat the stream until the end. The reasoning for this is that a connection reset error wouldn't allow the custom UploadHandler to specify a custom error message, and the end user would be without a doubt completely confused. I think perhaps we should give a keyword argument so a person can do either:

    raise StopUpload()

or

    raise StopUpload(connection_reset=True)

So that the user understands the consequences of aborting an upload.

parser.abort()

Yeah... abort was in some initial API that never really existed nor was documented. I was going to change it but must have forgotten about it. I would've caught it had I actually done work on django-uploadutils. Time for me to get over there...

comment:276 by Camille Harang, 16 years ago

Hi all,

my temporary files are correctly garbage collected in wsgi development mode, but not when running mod_python with apache2. After sutting down apache the files remain on the harddrive. Did anybody experienced that?

Camille.

comment:277 by Horst Gutmann <zerok@…>, 16 years ago

Cc: zerok@… added

in reply to:  276 ; comment:278 by Michael Axiak, 16 years ago

Replying to Camille Harang:

my temporary files are correctly garbage collected in wsgi development mode, but not when running mod_python with apache2. After sutting down apache the files remain on the harddrive. Did anybody experienced that?

Hey Camille. Thanks for the report! I'll check on this after I update the patch per Jacob's comments. The only way I can see this happening is if apache is somehow preventing it from garbage collecting. However, this is why we store temporary files in /tmp -- the OS can (and should) wipe out that directory periodically. :-) There's no guarantee that if a process gets a KILL signal or crashes hard that the temporary data will not be left on the disk -- I don't think we can make this guarantee. (Again though, I will look into this.)

Cheers,
Mike

in reply to:  278 comment:279 by Maniac <Maniac@…>, 16 years ago

There's no guarantee that if a process gets a KILL signal or crashes hard that the temporary data will not be left on the disk -- I don't think we can make this guarantee.

From my experience it really cannot be done since when Apache restarts it kills it's processes hard and they don't have a chance to cleanup. So leaving sometime tmp files is a necessary evil.

comment:280 by Johannes Beigel, 16 years ago

Cc: johannes.beigel@… added

comment:281 by Camille Harang, 16 years ago

Hi all,

yes i dreaded that there must be temporary file problems with apache, unfortunately i use a separate tmp directory because i some cases i need to keep the files so i want to avoid cross-filesystem copying when renaming it. I use a cron job to delete unused files (checking if it is accessed with fuser), if somebody is intersted in just ask me. BTW i added a 'keep' parameter to my TemporaryFile subclass, setting it to True at any time prevents the file to be deleted after closing, then just use os.rename() to store it smoothely elsewhere. I don't know if it is worthed to add this feature in the main code, anyway here it is bellow. I also experienced problems when checking files content in file_complete() because the file remains unclosed, i added self.file.flush() before self.file.seek(0) in my TemporaryFileUploadHandler subclass, once again i don't know if it needs to be included in the main code, it's a suggestion.

Camille.

class MaybeTemporaryFile(TemporaryFile):

    keep = False

    def __del__(self):

        if not self.keep:

            try: os.unlink(self.name)
            except OSError: pass

comment:282 by David Clymer <david@…>, 16 years ago

Cc: david@… added

comment:283 by Ivan Sagalaev <Maniac@…>, 16 years ago

Mike, can you clarify a bit more... I now have a code that does save_FIELD_file, giving it file contents as a string. It fails with your patch since this method expects an UploadedFile instance instead. Are you going to leave it as a documented incompatibility or it can be dealt with?

comment:284 by Mads Sülau Jørgensen, 16 years ago

A little bug in the current version of the patch. In the method chunk of the class UploadedFile there is a call to the method file_size which is not a method, as it is defined as a property, and hence overwritten by this property. The property should be used insted of the method and the method file_size should be removed, as it serves no purpose. The call to file_size generates an exception TypeError: 'int' object is not callable, which again makes chunk unusable on TemporaryUploadedFile.

by Michael Axiak, 16 years ago

Attachment: 2070_revision7484.diff added

Updated diff per Jacob's requests and useful feedback.

comment:285 by Michael Axiak, 16 years ago

Hi everyone,

Thank you so much for the tremendous feedback!

A few things to note about the new patch:

  1. save_FIELD_file was supposed to be backwards compatible. It is now, only emitting DeprecationWarning warnings when you do so.
  2. The file_size attribute is no longer a function nor referenced as a function.
  3. The StopUpload interface now works, with the optional connection_reset=True parameter to allow the connection to reset.
  4. I think I've gone over all of Jacob's suggestions and fixed each of the problems.

I have a few things I want to do still that don't have to do with programming:

  1. Outline where I have tested this (which platforms, what testing cases).
  2. Enumerate every single backwards incompatibility (even DeprecationWarning emissions), and explain why they are there.
  3. Respond to each of Jacob's comments.

Thanks again for your feedback!

in reply to:  285 ; comment:286 by John Hensley, 16 years ago

Replying to axiak:

  1. The file_size attribute is no longer a function nor referenced as a function.


I'm new to this patch, having just needed it today, but I think this may be causing problems with InMemoryUploadedFile. With no file_size, FileField.clean() fell through to the dictionary usage, which failed at file_name = data.get('filename'). Changing that to file_name = data['filename'] avoided the failed validation, and adding a file_size attribute to {{InMemoryUploadedFile}}} (with a quick len/seek at the end of the constructor) seems to have fixed the attribute validation.

by Michael Axiak, 16 years ago

Attachment: 2070_revision7484.2.diff added

Fixed file_size bug with memory handling.

in reply to:  286 ; comment:287 by Michael Axiak, 16 years ago

Replying to john:

(with a quick len/seek at the end of the constructor) seems to have fixed the attribute validation.

Doh. The regression is fixed in the latest patch (of the same name). Although the solution is not to do another len/seek but to update the handler to use the file_size counter.

Anyway, this teaches me that I have to think about clever ways of leveraging the TestClient to catch these types of errors.

Cheers,
Mike

comment:288 by Ivan Sagalaev <Maniac@…>, 16 years ago

Just checked -- save_FIELD_file now works.

in reply to:  287 comment:289 by John Hensley, 16 years ago

Replying to axiak:

Doh. The regression is fixed in the latest patch (of the same name). Although the solution is not to do another len/seek but to update the handler to use the file_size counter.

Yep, I just tested a clean checkout and it's working fine now without any of my cheesy hacks. ;) Thanks.

comment:290 by Collin Grady <cgrady@…>, 16 years ago

Cc: dcramer@… removed

comment:291 by anonymous, 16 years ago

Cc: django@… added

comment:292 by MockSoul <mocksoul@…>, 16 years ago

Cc: mocksoul@… added

comment:293 by anonymous, 16 years ago

Cc: schlaber@… added

comment:294 by Mads Sülau Jørgensen, 16 years ago

I would find it very useful to have the UploadedFile cary a tell() method. It basiclly just needs to propagate the call forward to file and would help to complete the file-likeness of the objects.

comment:295 by Djoume Salvetti, 16 years ago

Cc: dsalvetti@… added

This patch is awesome, many thanks to all the developers involved in building it.
I just wanted to mention that I had to replace :

if content_length <= 0:

by

if content_length < 0:

in django/http/multipartparser.py line 94 as a workaround to a Safari bug.

For some reason, Safari 3.x on Mac (not on PC) set the Content-Type to multipart/form-data on a redirect (even if it's a GET request...) but doesn't set the Content-Length.

comment:296 by bruno@…, 16 years ago

Great patch, but why limit it to uploads which have multipart/form-data as their content-type?
Why not extend the mechanism to any kind of binary data upload (e.g. handling file upload using the Atom Publishing Protocol) ?

Also, http://orodruin.axiak.net/upload_handling.html doesn't seem to be up anymore. Is there another location for the API docs? Thanks, Bruno.

by Leah Culver, 16 years ago

Attachment: 2070_revision7599.diff added

Updated patch to r7599

comment:297 by anonymous, 16 years ago

Cc: danpoe@… added

by Jacob, 16 years ago

Attachment: 2070-r7695.patch added

Streaming uploads patch updated to [7695] and reviewed.

comment:298 by Jacob, 16 years ago

I've completed a comprehensive review of all the code in the latest patch and updated it to trunk; the new patch is attached. I still need to edit and review the docs, but that's on me. Other than that, the TODO list is getting quite a bit shorter. Here's what's left before we can call for testing and merge this sucker:

The big stuff

  • We need unit tests for all the various data structures introduced in django/core/files (UploadedFile, the handlers, etc.). Most of this is sorta-tested by the holistic upload tests, but unit tests are particularly needed here. In most cases the needed bits are easy to mock test.


  • The file upload regression tests (currently in regressiontests/test_client_regress) ought to be moved to a seperate test module (file_uploads, probably) so it can be run seperately. More tests are also needed here; coverage seems pretty low.


  • InMemoryUploadedFile.chunk() uses return but its parent method UploadedFile.chunk() is a generator. This seems very dangerous -- the API should be a function or a generator, but not either -- and should be fixed if there's not a good reason for it.


  • FileUPloadHandler.new_file() something (i.e. "Stop") to stop handling. That's nasty; it should use some sort of exception to do that.


  • django/http/multipartparser.py:374 this should absolutely not be here. How, when and why would this happen? This absolutely needs to Just Work.



Small stuff and questions

  • django/db/models/base.py:519: is hardcoding 64k OK here, or should this be some sort of setting?


  • Look at the use of filterwarnings('ignore') in tests/modeltests/model_forms/models.py} is that intentional or just papering over internal uses of the "old" APIs? Ditto tests/regressiontests/forms/tests.py.


  • Dunno how I feel about the FieldType object -- it doesn't feel very Pythonic. Why not just assign string literals to those constants? That's how its done elsewhere in Django.


  • Idiom: ParseBoundaryStream should be parse_boundary_stream since it's a function. Unless there's a reason you called it this?

in reply to:  298 ; comment:299 by fdr, 16 years ago

Replying to jacob:

  • django/http/multipartparser.py:374 this should absolutely not be here. How, when and why would this happen? This absolutely needs to Just Work.

I authored this check. It's actually not to target any regression in particular, but I decided it would be a better option than -- in event of a parser bug -- a busy loop. The rationale is that the parser is conceptually a recursive descent parser with backtracking and that if written naively that would mean that parser bugs would manifest as a stack overflow error. The old parser had a bug like this where it would just be content to buffer as much as the stream as it wanted in memory given malformed requests (look for some of my previous posts above).

Since we want to take advantage of incremental computation and Python's iterators we lose this, as we can happily move back and forth in parser states forever in the event a bug is encountered. This 500 is a inexact proxy for the maximum stack depth before we decide we are never going to be able to make a reduction.

So far it has never been triggered (it was put in after we debugged some bugs that WOULD have caused it to trigger). You can remove it if you decide maybe-busy-looping is better than an exception.

Also, it's possible to defeat this heuristic by having a ring-oscillation of sorts where different values are popped off and put back onto the LazyStream. This is a much more complex class of bug and also less likely given the architecture of the parser. A more sound way to accomplish a recursion-depth-limit would be to count the depth of our parser state, and perhaps we should do that instead, even if it is somewhat more complicated.

So in conclusion: this check is not there because "there are some known issues we haven't ironed out and we need a catch-all," but more like "we don't have any known issues, but would rather not occupy 100% cpu if there are any regressions following a similar pattern of ones we've seen previously"

in reply to:  299 ; comment:300 by Jacob, 16 years ago

Replying to fdr:

I authored this check. It's actually not to target any regression in particular, but I decided it would be a better option than -- in event of a parser bug -- a busy loop. The rationale is that the parser is conceptually a recursive descent parser with backtracking and that if written naively that would mean that parser bugs would manifest as a stack overflow error. The old parser had a bug like this where it would just be content to buffer as much as the stream as it wanted in memory given malformed requests (look for some of my previous posts above).

Ah, OK -- makes sense. So the only way this would get triggered would be a maliciously malformed request package, right?

I may just change the error message, then -- "malformed multipart request"?

comment:301 by Ronkay János Péter, 16 years ago

Cc: django@… added

in reply to:  300 comment:302 by fdr, 16 years ago

Replying to jacob:

Ah, OK -- makes sense. So the only way this would get triggered would be a maliciously malformed request package, right?

That's right. The malformed request has to be strange in a way not seen or thought of. It could also potentially come up as a very obscure correct payload.

I may just change the error message, then -- "malformed multipart request"?

Sure, that's would seem perfectly fine. But pay in mind the following: because it is triggered in a very exceptional case (we do advertise handling bad inputs in a graceful way) I'd suggest something more strongly worded to report a bug is called for so it can be squashed.

This check is not a fallback. It a last resort, and in an ideal world would never be triggered. It's just a hopefully better alternative that doing something very wrong forever.

comment:303 by Jacob, 16 years ago

Since uploading patches is getting unwheidly, I've started working on this in a git branch: http://code.djangoproject.com/git/?p=django;a=shortlog;h=refs/heads/2070-streaming-uploads (check out from git://djangoproject.com/django and use the "2070-streaming-uploads" branch).

comment:305 by anonymous, 16 years ago

Cc: django-2070@… added
Keywords: sprstreaming intsept14 added; streaming sprintsept14 removed

comment:306 by anonymous, 16 years ago

Keywords: streaming sprintsept14 added; sprstreaming intsept14 removed

comment:307 by ipse, 16 years ago

Cc: ipse.reg@… removed

by Jacob, 16 years ago

Attachment: 2070-r7728.patch added

"release candidate" patch

comment:308 by Jacob, 16 years ago

Owner: changed from Michael Axiak to Jacob
Status: assignednew

I've uploaded what I think is a finished patch. It's also available from git: http://code.djangoproject.com/git/?p=django;a=shortlog;h=refs/heads/2070-streaming-uploads (clone from git://djangoproject.com/django and checkout the 2070-streaming-uploads branch).

Public testing is needed, but once its been kicked around some this'll get merged to trunk.

comment:309 by clint, 16 years ago

Cc: clintecker@… added

comment:310 by paltman, 16 years ago

Cc: paltman@… removed

comment:311 by Jacob, 16 years ago

Resolution: fixed
Status: newclosed

(In [7814]) Fixed #2070: refactored Django's file upload capabilities.

A description of the new features can be found in the new upload handling documentation; the executive summary is that Django will now happily handle uploads of large files without issues.

This changes the representation of uploaded files from dictionaries to bona fide objects; see BackwardsIncompatibleChanges for details.

comment:312 by Jacob, 12 years ago

milestone: 1.0 beta

Milestone 1.0 beta deleted

comment:313 by anonymous, 11 years ago

Easy pickings: unset
UI/UX: unset
Note: See TracTickets for help on using tickets.
Back to Top