Opened 9 years ago

Closed 7 years ago

Last modified 3 years ago

#2070 closed enhancement (fixed)

Large streaming uploads

Reported by: oyvind.saltvik@… Owned by: jacob
Component: HTTP handling Version: master
Severity: normal Keywords: streaming, upload, large, sprintsept14, feature
Cc: dsalvetti@…, matthias@…, oliver@…, Maniac@…, nesh@…, aenor.realm@…, gary.wilson@…, serialx.net@…, wonlay@…, lurker86@…, shaun@…, antonio.mele@…, herbert.poul@…, gabor@…, koebbe@…, axiak@…, jm.bugtracking@…, moritz.angermann@…, django@…, brosner@…, root.lastnode@…, akaihola+djtick@…, sam@…, trey@…, klaus.blindert@…, ville@…, reza@…, registrations@…, calvin@…, works@…, andrewbadr.etc@…, uptimebox@…, drfarina@…, beau@…, giuliani.v@…, prigun@…, zerok@…, johannes.beigel@…, david@…, django@…, mocksoul@…, schlaber@…, danpoe@…, django@…, django-2070@…, clintecker@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Based on code from ticket 1448.

Test for large file upload

from django.db import models

# Create your models here.
class FileList(models.Model):
    name = models.CharField(maxlength=255)
    email = models.EmailField()

    class Admin:
        pass

class AFile(models.Model):
    descr = models.CharField(maxlength=255,core=True)
    file = models.FileField(upload_to='files')
    inlist = models.ForeignKey(FileList,edit_inline=models.STACKED)

And two 500 megabyte files for upload.

Works with the patch with +20 mb above average memory and minimal cpu load usage by the httpd thread, file upload successful.

Without the patch httpd rages to +140 mb above average memory usage and over 60% cpu usage and fails miserably in the end.

Attachments (64)

3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id.diff (16.7 KB) - added by [530] 9 years ago.
Now using X_PROGRESS_ID instead of X-Progress-Id, accepts any lower/uppercase/ - / _ /prefix variant
3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id_windowsfix.diff (16.9 KB) - added by [Cha0S] 9 years ago.
windows temporary file locking fix
modpyton-ok-needs-fcgi-testing.3.diff (28.6 KB) - added by Øyvind Saltvik <oyvind.saltvik@…> 9 years ago.
did not work without middleware, fixed
4459-streaming-file-upload.diff (25.9 KB) - added by Joakim Sernbrant <serbaut@…> 9 years ago.
Simplified streaming uploads
4459-streaming-file-upload.2.diff (26.7 KB) - added by Joakim Sernbrant <serbaut@…> 9 years ago.
Added FILE_UPLOAD_MIN_SIZE (default 100kb) to define minimum request size for streaming to disk. Propage exeptions. I'm not too happy with the names of the settings anymore :/
5065-streaming_file_upload_with_shutils.diff (26.6 KB) - added by axiak@… 8 years ago.
I've updated the patch to include the shutils command and to work with [5065]. Please check to see if it works.
5065-streaming_file_upload_with_shutils_2.diff (27.5 KB) - added by axiak@… 8 years ago.
Works in [5065], renamed settings variable, uses global settings, defaults STREAMING_MIN_POST_SIZE to .5MB (please test though!)
5070-streaming-file-upload.diff (26.7 KB) - added by Øyvind Saltvik <oyvind@…> 8 years ago.
Updated to trunk, without changes
5078_streaming_file_upload_with_shutils_and_fallbacks.diff (28.1 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Usese shutils, but falls back.
5078_streaming_file_upload_with_shutils_and_fallbacks_2.diff (28.2 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Usese shutils, but falls back, this time it deletes the tmp file even when it falls back.
5078-streaming_file_upload_with_shutils_and_chunked_fallbacks.diff (28.4 KB) - added by Michael Axiak <axiak@…> 8 years ago.
This file uses python fallbacks, but using chunks to avoid loading the entire file into memory.
5079-streaming_file_upload_with_safe_file_move.diff (29.1 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Cleaned it up a bit. Moved file_move_safe into django.utils in case it should be used in future endeavors.
5089-streaming_file_upload_with_safe_file_move.diff (28.0 KB) - added by Øyvind Saltvik <oyvind@…> 8 years ago.
Added missing else
5089-streaming_file_upload_with_safe_file_move.2.diff (29.1 KB) - added by Øyvind Saltvik <oyvind@…> 8 years ago.
Forgot to add django/utils/file.py
5099_patch_for_streaming_uploads.diff (30.8 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Uses multiple mechanisms for determining the progress id.
5100_upload_request_meta.diff (33.8 KB) - added by Michael Axiak <axiak@…> 8 years ago.
This isn't important enough to get its own property in request...request.META now contains 'UPLOAD_PROGRESS_ID'
5100_forgot_to_revert_base.diff (32.5 KB) - added by Michael Axiak <axiak@…> 8 years ago.
I included a bit much in that last patch.
5100_file_upload_core.diff (34.1 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Meant to be cleaner, especially in light of #4165
5100_file_upload_core_with_middleware_hooks.diff (37.7 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Added middleware hooks
5100_file_upload_core_with_middleware_hooks_2.diff (37.7 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Added middleware hooks...this is better.
5100_file_upload_core_with_middleware_hooks_3.diff (41.8 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Apparently I don't know how not to lose files.
5100_file_upload_core_with_middleware_hooks_4.diff (38.9 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Apparently I don't know how not to lose files.
5113_file_upload_core_with_middleware_hooks.diff (39.1 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Fixes WSGI bug.
5116_streaming_upload_fixed_middleware_append.diff (35.4 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Fixed bug where .append(0, func) was called.
5116_streaming_upload_fixed_middleware_append_2.diff (42.4 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Fixed bug where .append(0, func) was called with the files this time.
5126_file_upload_wsgi_tested.diff (41.3 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Tested with WSGI and made a few changes.
5126_file_uploads_latest_patch.diff (39.4 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Added 'state':'starting' to be more like mod_uploadprogress.
5127_file_uploads_no_streaming_fixed.diff (39.2 KB) - added by SmileyChris 8 years ago.
There was an error uploading large files with streaming turned off
5313_updated_file_progress.diff (36.7 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Removed some unneeded things. No file progress tracking by default.
5343_1_streaming_file_upload.diff (38.4 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Newer, cleaner version of file upload script
5343_streaming_file_upload_no_assert.diff (39.3 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Newer, cleaner version of file upload script (doh! no random commented assert)
5343_streaming_file_upload_best.diff (38.4 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Sorry about that -- this one is the better one.
5343_cleaned_streaming_file_upload.diff (37.4 KB) - added by Michael Axiak <axiak@…> 8 years ago.
It's amazing what Trac helps you see.
5343_cleaned_streaming_file_upload_tweaked.diff (37.3 KB) - added by Michael Axiak <axiak@…> 8 years ago.
Made a subtle fix after testing with #4165.
5343_cleaned_streaming_file_upload_tweaked2.diff (37.3 KB) - added by klaus.blindert@… 8 years ago.
Fixed microscopic bug in an error path
5722.diff (26.4 KB) - added by simonbun <simonbun@…> 8 years ago.
Updated patch against r5722
5722.2.diff (22.2 KB) - added by simonbun <simonbun@…> 8 years ago.
Sorry, that last one patch also included #4165. This is the correct one.
5724_streaming_file_upload.diff (37.8 KB) - added by Brian Rosner <brosner@…> 8 years ago.
updated to r5724 (previous patch was missing files too)
5820_streaming_file_upload.diff (37.8 KB) - added by Brian Rosner <brosner@…> 8 years ago.
updated to r5820
6469_streaming_file_upload.diff (38.5 KB) - added by Faheem Mitha <faheem@…> 8 years ago.
6525_all_tests_pass.diff (39.6 KB) - added by Øyvind Saltvik <oyvind@…> 8 years ago.
Fixed unicode in POST issues, added django.http.utils, moved str_to_unicode there
6603_all_tests_pass_uploadedfile_wrapper_fixed.diff (42.1 KB) - added by Øyvind Saltvik <oyvind@…> 8 years ago.
Fixed a problem with UploadedFile wrapper and making sure content is not read in Fie/ImageField
6652_isValidImage_using_tmpfilename.diff (42.7 KB) - added by Øyvind Saltvik <oyvind@…> 8 years ago.
made the isValidImage validator try the tempfile before reading content
6654_fixed_tests_and_file_clean.diff (45.0 KB) - added by Øyvind Saltvik <oyvind@…> 8 years ago.
diff did not apply cleanly, fixed
streaming.6710.patch (45.5 KB) - added by Faheem Mitha <faheem@…> 8 years ago.
Updated streaming patch to trunk rev 6710.
streaming.7092.patch (44.8 KB) - added by egon 8 years ago.
Made it apply cleanly to rev. 7092
streaming.7092.patch.partial_tests_fix (48.6 KB) - added by faheem 8 years ago.
Slightly modified version of streaming.7092.patch with more tests passing.
streaming.7106.patch (48.0 KB) - added by egon 8 years ago.
Passes al tests.
ticket2070_rev7277.diff (48.1 KB) - added by axiak 7 years ago.
Same patch applied to @7277
2070_revision7339_uploadhandling.diff (38.8 KB) - added by axiak 7 years ago.
NEW Upload handling for revision 7339
2070_revision7339_formhandling.diff (11.4 KB) - added by axiak 7 years ago.
NEW Form handling for uploaded files for revision 7339
2070_revision7351_latest.diff (67.7 KB) - added by axiak 7 years ago.
Latest patch for 2070...includes everything (see comment)
2070_latest7354.diff (68.8 KB) - added by axiak 7 years ago.
The latest #2070 patch. Small changes...see comment.
2070_revision7359.diff (76.9 KB) - added by axiak 7 years ago.
New diff...some style changes and new documentation.
2070_revision7363.diff (93.2 KB) - added by axiak 7 years ago.
Altered documentation to be more approachable.
2070_revision7363.2.diff (94.0 KB) - added by axiak 7 years ago.
Slightly newer...handles exhaustion a little bit differently.
2070_revision7377.diff (95.3 KB) - added by axiak 7 years ago.
New 2070 patch...includes fixes to the parser in addition to the newforms fix caught by EricB as well as better docs.
2070_revision7396.diff (98.6 KB) - added by axiak 7 years ago.
Better patch...not sure how the newforms code in the previous patch was the way it was.
2070_revision7388.diff (98.6 KB) - added by axiak 7 years ago.
new patch that fixes error with instantiating the upload handlers, thanks Eric!
2070_revision7484.diff (99.2 KB) - added by axiak 7 years ago.
Updated diff per Jacob's requests and useful feedback.
2070_revision7484.2.diff (99.3 KB) - added by axiak 7 years ago.
Fixed file_size bug with memory handling.
2070_revision7599.diff (45.6 KB) - added by leahculver 7 years ago.
Updated patch to r7599
2070-r7695.patch (100.6 KB) - added by jacob 7 years ago.
Streaming uploads patch updated to [7695] and reviewed.
2070-r7728.patch (114.5 KB) - added by jacob 7 years ago.
"release candidate" patch

Change History (377)

comment:1 Changed 9 years ago by oyvind.saltvik@…

  • Severity changed from normal to major

Should there be a setting to enable/disable this?

Needs testing on python 2.3.

comment:2 Changed 9 years ago by ubernostrum

  • priority changed from high to normal
  • Resolution set to duplicate
  • Severity changed from major to normal
  • Status changed from new to closed

This is a duplicate of #1484.

comment:3 Changed 9 years ago by ubernostrum

  • Resolution duplicate deleted
  • Status changed from closed to reopened

Or is it? Leaving it open until someone who knows more about it can sort this out.

comment:4 Changed 9 years ago by [530]

Detailed description of patch.

Based of the #1448 patch that i experienced trouble with using edit inline and large files.

Changes from trunk:

  • Added STREAMING_UPLOADS=False to global_settings
  • Added parse_streaming_file_upload it http/__init__.py
  • Made changes to modpython and wsgi in handlers to use parse_streaming_file_upload
  • Subclassed cgi.FieldStorage
    • overriding of make_file to use NamedTemporaryFile
    • overriding of bouandry parsers to make readline read in 64k chunks to avoid reading extremely long lines into memory
  • Subclassed rfc822.Message
    • overriding of bouandry parsers to make readline read in 64k chunks to avoid reading extremely long lines into memory
  • Made FileField validate upload using tempfile size
  • Made save_file in db/models/fields/__init__.py pass temporary file name to _save_FIELD_file instead of file contents
  • Made _save_FIELD_file copy temporary file instead of write file from memory

comment:5 Changed 9 years ago by [530]

Error in previous post.

Based of the #1448 patch that i experienced trouble with using edit inline and large files.

Should be.

Based of the #1484 patch that i experienced trouble with using edit inline and large files.

comment:6 Changed 9 years ago by Matias Hermanrud Fjeld <mhf@…>

This patch uses the rfc822 module. The Python docs says the following about rfc822:

  Deprecated since release 2.3. The email package should be used in preference to the rfc822 module.
  This module is present only to maintain backward compatibility.

comment:7 Changed 9 years ago by Home

  • Type defect deleted

comment:8 Changed 9 years ago by [530]

  • Type set to defect

The problem with the email package is that it reads the entire file into memory. Django supports python 2.3 anyway.

comment:9 Changed 9 years ago by [530]

to use monitor from a view

def progress_view(request):
    
    import pickle

    sessionid = request.COOKIES.get('sessionid', None)
    progress = pickle.load(open('/tmp/'+sessionid, 'r'))

Accessible variables

progress.fullength - full request byte lenght
progress.at_status - current bytes read
progress.at_file - current file read
progress.file_status - current file bytes read


comment:10 Changed 9 years ago by bnewbold@…

Had a problem on FreeBSD with my /tmp being on a seperate filesystem: os.rename threw an "Invalid cross-device link" error.
The fix was to edit django/db/models/base.py, "import shutils" at the top and change the one instance of "os.rename" to "shutils.move". Works fine now... shutils came default with python2.4 I don't know about other releases, or if it works on other OSs

comment:11 follow-up: Changed 9 years ago by adrian

In reference to bnewbold's comment above, it's the shutil module, and it looks like it's been around since at least Python 2.3. (Django supports Python 2.3 and up.)

comment:12 Changed 9 years ago by [Cha0S]

to make it work on windows make the following changes in UploadStateKeeper class:

class UploadStateKeeper:

    def __init__(self, identifier, fullength=None):

        self.identifier = identifier

        if fullength:
            self.state = UploadState(self.identifier, fullength)

        else:
            import pickle
            self.state = pickle.load(self.get_status_path(), 'r')

    def get_status_path(self):
        from tempfile import gettempdir
        return os.path.join(gettempdir(), self.identifier)

    def set_status(self, filename, addlen):
        self.state.set_status(filename, addlen)
        import pickle
        pickle.dump(self.state,open(self.get_status_path(), 'w'))

    def get_state(self):
        return self.state

comment:13 Changed 9 years ago by jacob

Can anyone tell me the differences between the latest patch on this ticket and the latest one on #1484? Are these two tickets/patches duplicates or complements?

comment:14 Changed 9 years ago by bahamut@…

  • Type changed from defect to enhancement

This ticket is the continuation of #1484.

comment:15 Changed 9 years ago by jacob

Sorry if I'm being dense, but does that mean that #1484 should be closed in favor of this one? I'm trying to figure out which patch(es) need to be applied here.

comment:16 Changed 9 years ago by bahamut@…

  • Keywords timezone time zone UTC added

In my opinion, yes #1484 should be closed. The only patch you have to apply is the latest one in this ticket.

It may help to svn up -r 3358 before you apply the patch, or you can try at the current HEAD and see what goes.

comment:17 Changed 9 years ago by jacob

Thanks.

comment:18 Changed 9 years ago by jacob

  • Owner changed from adrian to jacob
  • Status changed from reopened to new

Right, I've finally had a chance to play with this patch. It's a big one, so I want to make sure I understand what's going on before I check it in. So apologies if I'm being dense, but I gotta understand this stuff:

  • I'm confused about why you needed to override Message.readline} just to change the argument to self.fp.readline() (django/http/init.py, line 105). Is the Python implementation broken in some way? What happens if you take that bit out?


  • Pretty much the same question for the FieldStorage subclass. This is a *lot* of duplicated code from the builtin cgi module, and it gives me pause. Can you explain why you need to do this (and perhaps think about if there's another way?)


  • The upload status info stored in /tmp is pretty awesome. However, it doesn't seem to work all the time for me; while uploading a big file if I keep cat'ing that /tmp file it's often just empty. Seems something keeps emptying it out? Also, using the session ID seems like it would cause issues if I tried to upload multiple files at the same time (a very real possibility if I'm uploading really big-ass things).


  • Related to that, setting os.envrion['SESSIONID'] (done in both the WSGI and the mod_python handler) seems very leaky. We've had bugs before with this type of env monkeybusiness... Could you use a threadlocal instead?


I should end, however, by saying that the behavior of this patch rocks. Being able to upload a 2 GB file without locking my server is super-nice, and the status stuff absolutely kicks ass (I think I already said that). I'm pretty sure I can clean up all the stuff I just bitched about, but I've got a pretty long todo.txt and if you can take care of these problems and/or answer my questions, I'll check this in PDQ.

comment:19 Changed 9 years ago by jacob

  • Status changed from new to assigned

comment:20 Changed 9 years ago by [530]

  • If that bit is taken out memory usage soars because most browsers do not use newlines when sending files so it actually reads to EOF. Is it possible to override self.fp.readline() to call self.fp.readline(64000)? Then most of this code would be removed.
  • The upload status is a bit tricy, I need to override FieldStorage to store the status. But since FieldStorage is not a extended class of object there is no super(FieldStorage, self).read_lines_to_outerboundary() causing repetition of code.
  • Don't know how multiple uploads can be solved since it would require multiple id's, maybe use sessions instead if possible. Never experienced the emptiying if the temp file. Does this happen with both mod_python and wsgi?
  • About setting os.environ, the mod_python-handler does not set it, but wsgi does, need to fix that one instead of messing with threadlocal , this may be why the file is emptied if it only happens with wsgi.

comment:21 Changed 9 years ago by anonymous

  • Cc Maniac@… added

comment:22 Changed 9 years ago by anonymous

Jacob, after reading your questions I'd like to share here my knowledge as an author of original patch in #1484.

  • About readline(N). Actually browsers do send newlines when sending binary files because newlines (byte 0x0A) are present in binary files also. In theory 0x0A in a binary body can be rare enough for readline to read very large chunks in memory. However in practice 0x0A are present often enough (I'm talking from the experience of two services working mainly with big jpegs, mp3 and zips). This is, btw, why my original patch in #1484 didn't include the boundary in readline. The downside of replacing readlines in FieldStorage's code with readline(N) is that the base class doesn't provide this for overriding so you have to actually rewrite one-to-one pretty large chunks of code.
  • To my liking storing upload info in files is not very nice. I mentioned my idea in django-developers about this that it's better be a hook for the user code (in middleware style) that is called from the handler accepting uploaded data. This is more flexible because a user can store this progress not in /tmp but wherever he likes and do more interesting stuff then just measuring progress. For example closing the upload if the file is too big for a server.
  • One more thing that we've already discussed with Oyvind in an old ticket that his patch alters both HTTP handlers and admin app. The old patch was only for HTTP and my point is that these things should be separate because it's much simpler to review and apply.

Hope my comment was useful!

comment:23 Changed 9 years ago by anonymous

One more nit... Oyvin what was the reason to use rfc822? It's a new thing in your variant of the old patch and I just don't get why it's needed :-)

comment:24 Changed 9 years ago by [530]

  • When uploading two 500mb iso's, it broke down constantly with a memoryerror when not setting a buffer size, everything worked fine when a bufsize was added.
  • About upload storage I do agree about using a hook instead.
  • Rfc822 is still used by FieldStorage, now hidden from view. I just don not think of setting bufsize in make_file instead of passing it to readline (if that works). Have not had time to test yet.
  • Agree that reviewing is easier. But it is nice for easy testing via the admin to see why django is the right tool for the massive job. :)

comment:25 Changed 9 years ago by anonymous

Ignore my patch 3358-streaming-file_with_settings_and_progressmonitor.3.diff, adding bufsize does not affect readline. Also it seems chunked reads are required for storing the state. Don't see any easy way to make this less code yet. I'll add a new patch with just the WSGI fix.

comment:26 Changed 9 years ago by [530]

I was planning to make a middleware for a progressmonitor hook, but not quite sure when middleware is applied and how to pass this hook from the middleware to FieldStorage. Any help to clarify/fix this would be great.

Also if somenoe could test this successfully using WSGI it would be great.

comment:27 follow-up: Changed 9 years ago by Maniac <Maniac@…>

My idea of a hook is not fit into current middleware, I just wanted to retain the style. So to clarify I see it like this.

This can be a separate set of hooks similar to MIDDLEWARE_CLASSES or it can be done in middleware classes themselves. To do the latter we will specifiy that in addition to process_request, _response, _view and _exception methods a middleware can define a new kind of method: process_upload(request, data).

parse_file_upload then would search for these middleware methods just as http requests do in modpython.py and wsgi.py. The list of methods is passed to a FieldStorage decendant that will call them each time it's about to write a chunk of data to the target stream.

This is it... Sorry I can't come with the actual code: pretty busy rolling out my current project.

comment:28 Changed 9 years ago by Todd O'Bryan

I figured the last attachment was supposed to be a diff and tried to upload it with the right extension. Unfortunately, '.py' got added and now I don't know how to get rid of the attachment.

Sorry for the clutter.

comment:29 Changed 9 years ago by [530]

Seems wsgi can't do streaming uploads this way, but ligthttpd 1.5 will chunk the uploads and has a mod_uploadprogress that serializes to JSON. Will try adapt this patch to serialize the same way so it wont matter if you use this streaming uploads patch or mod_uploadprogress.

comment:30 Changed 9 years ago by Maniac@…

Why do you say wsgi can't do this? It gives environwsgi.input? to the handler that is a file-like object with uploading data. Handler then can read it in chunks (it actually does this in parse_file_upload).

comment:31 Changed 9 years ago by [530]

WSGI may, but it seems like lighthttpd does not pass anything to wsgi before the request is done. I may be wrong, but I have no way of debugging properly.

comment:32 Changed 9 years ago by [530]

Bug in last patch, did not manage to override readline to use a buffer size. If this cannot be done, code must be repeated.

comment:33 Changed 9 years ago by [530]

Wsgi users using lighttpd should use mod_uploadprogress instead. There is some kind of delay before the upload state is updated using wsgi, the same does not happen on mod_python. Is this a issue with my code, the fastcgi/scgi spec or lighttpd's implementation? Can someone test this with fastcgi/scgi on apache?

comment:34 Changed 9 years ago by [530]

GET identifer for upload state.

http://foo.com?bar saves json data in /tmp/bar

For javascript, making a /progress/ url and view see http://blog.lighttpd.net/articles/2006/08/01/mod_uploadprogress-is-back

Makes it easy to add a admin js option to a model to enable upload state in admin.

comment:35 Changed 9 years ago by toddobryan

I just provided a patch which steals liberally from Oyvind, but decouples the progress monitor from the rest of the file upload behavior.

In a nutshell, if you don't want to read uploads into memory, you set request.save_file_class to the class of the kind of file-like object you'd like uploads routed to. A sample middleware class is provided which uses a temp file.

I haven't looked at Oyvind's latest code, but my guess is that we could easily move the upload monitor from the guts of Django by providing a file-like object class that keeps track of how much stuff it has received from the POST data stream.

Comments welcome. (But be nice; it's my first patch.)

comment:36 Changed 9 years ago by [530]

To toddobryan, I made some fatal mistakes before 3358-streaming-file_with_settings_and_progressmonitor_fixed.diff + the wsgi fix, see this patch, you may have fixed it but i'm not sure. the latest patch I would say is really important for multiple uploads. There are some more upload progress issues, the json encoder gives two value sets but i can't find out why. Posting my latest diff now.

Btw I like one more developer helping me.

comment:37 Changed 9 years ago by toddobryan

OK, you don't have to be very nice. It turns out that I missed perhaps the most important thing, which is wrapping the stream from the POST in some kind of object which prevents lines from being too long and filling up memory.

Oyvind's PostStream seems like the kind of thing to do, but I'd suggest wrapping the stream before we pass it into the FieldStorage object, not inside. I was misled by the fact that the read_binary() method inside there has a fixed 8k buffer.

comment:38 Changed 9 years ago by [530]

Toddobryan I agree. Making it more modular is great.

comment:39 Changed 9 years ago by toddobryan

If you could concentrate on getting your progress monitor to work from a file-like object instead of FieldStorage, I'll integrate the PostStream idea into my code so that the readline()s in the FieldStorage class don't read too much.

comment:40 Changed 9 years ago by [530]

Thinking of making the upload monitor as a hook in streaming_upload_middleware.

Then make a upload monitor middleware that hooks onto streaming_upload_middleware by loading before streaming_upload_middleware.

Upload monitor will act exacly like mod_uploadprogress so wsgi users just load streaming_upload_middleware.

comment:41 Changed 9 years ago by mtredinnick

By the way, when you guys think you've addressed all the problems and simplified the code as much as you can, etc, just add a comment pointing to the right patch that needs another review (and mention that you think it's "done").

comment:42 Changed 9 years ago by [530]

To test 'svn up -r 3518', apply patch and add:

django.middleware.upload.UploadStateMiddleware and django.middleware.upload.StreamingUploadMiddleware to MIDDLEWARE_CLASSES.

Uploadstate test currently builtin to admin change form. 'tail /tmp/test123' for upload state as json.

comment:43 Changed 9 years ago by anonymous

Regular uploads seem to fail if the middlewares are disabled. Backtrace:

Traceback (most recent call last):
  File "/opt/local/lib/python2.4/site-packages/django/core/servers/basehttp.py", line 272, in run
    self.result = application(self.environ, self.start_response)
  File "/opt/local/lib/python2.4/site-packages/django/core/servers/basehttp.py", line 611, in __call__
    return self.application(environ, start_response)
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 158, in __call__
    response = self.get_response(request.path, request)
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/base.py", line 102, in get_response
    return self.get_technical_error_response(request)
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/base.py", line 134, in get_technical_error_response
    return debug.technical_500_response(request, *sys.exc_info())
  File "/opt/local/lib/python2.4/site-packages/django/views/debug.py", line 131, in technical_500_response
    return HttpResponseServerError(t.render(c), mimetype='text/html')
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 155, in render
    return self.nodelist.render(context)
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 688, in render
    bits.append(self.render_node(node, context))
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 716, in render_node
    raise wrapped
TemplateSyntaxError: Caught an exception while rendering: 'NoneType' object has no attribute 'items'

Original Traceback (most recent call last):
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 706, in render_node
    result = node.render(context)
  File "/opt/local/lib/python2.4/site-packages/django/template/defaulttags.py", line 189, in render
    value = bool_expr.resolve(context, True)
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 548, in resolve
    obj = resolve_variable(self.var, context)
  File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 657, in resolve_variable
    raise VariableDoesNotExist, "Failed lookup for key [%s] in %r" % (bits[0], current) # missing attribute
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 61, in __repr__
    return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 97, in _get_post
    self._load_post_and_files()
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 75, in _load_post_and_files
    self._post, self._files = http.parse_file_upload(self)
  File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 83, in parse_file_upload
    return default_parse_file_upload(req)
  File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 50, in default_parse_file_upload
    raw_message = '\r\n'.join(['%s:%s' % pair for pair in req.header_dict.items()])
AttributeError: 'NoneType' object has no attribute 'items'

comment:44 Changed 9 years ago by Djordjevic Nebojsa <nesh at studioquattro co yu>

  • Cc nesh@… added

comment:45 Changed 9 years ago by dp_wiz

  • Cc aenor.realm@… added

comment:46 Changed 9 years ago by bahamut@…

This is anecdotal evidence at best, but  rapid testing of this patch seems to indicate it does not work with the development server.

I basically applied it, updated to head, added the 2 middlewares, tried one of my upload forms, and it just sat there forever, never reaching the Django view code.

comment:47 Changed 9 years ago by [530]

I would not expect this patch to work with the dev server, since it is a single thread you cannot do multiple requests at the same time.

comment:48 Changed 9 years ago by anonymous

I may be mistaken about my last comment, but the dev server uses the wsgi handler. So it may be the delay problem, or that stdin is not passed before the request is done. So since wsgi using lighttpd works, the dev server probably handle things differently. Don't really know alot about the dev server.

comment:49 Changed 9 years ago by anonymous

  • Cc gary.wilson@… added

comment:50 Changed 9 years ago by hruske

This doesn't work for me at all. With Opera 9 and Firefox 1.5, I get KeyError on X-Progress-Id, which is supposed to be HTTP_X_PROGRESS_ID? (Webserver is Lighttpd, distro Debian unstable)

Also, isn't this line a bit insecure, given the get_temp_file does an "os.path.join":

content = open(get_temp_file(request.header_dict['X-Progress-Id']), 'r').read()

I've replaced my code to use cache framework instead. But mind you need server-wide cache, per process will probably not be enough.

And then, you probably don't want to open a new TCP connection every time you make a xml request, since this will effectively DOS your server.

I was also having problems with JS - the browser did not want to do a POST at all. It started polling server for progress, but it was not posting at all.

comment:51 Changed 9 years ago by [530]

Lighttpd has a delay before it saves data and progress info.

Supposed to be HTTP_X_PROGRESS_ID?

I did it this way to be compatible with mod_uploadprogress that you should use instead.

Is there another way than opening another tcp connection using xmlhttp?

comment:52 Changed 9 years ago by hruske

How much should that delay usually be? Since there is no POST for Django, progress bar does not work. This probably means I need to use mod_uploadprogress to get it working, but the new mod_uploadprogress has not been released yet officially.

Yes, HTTP_X_PROGRESS_ID. When I printed contents of request.header_dict I could see a key named HTTP_X_PROGRESS_ID which corresponds to X-Progress-Id.

Sure, one could open a TCP connection and server periodically sends info and client reads from it. But I see that you aim for mod_uploadprogress compatibility.

comment:53 Changed 9 years ago by hruske

Also, it's X-Progress-ID in lighttpd, with capital ID.

Changed 9 years ago by [530]

Now using X_PROGRESS_ID instead of X-Progress-Id, accepts any lower/uppercase/ - / _ /prefix variant

comment:54 Changed 9 years ago by bahamut@…

  • Keywords streaming upload large added; timezone time zone UTC removed

The patch does not work with Apache 2 + mod_fcgid.

I'm running on Mac OS X 10.4.7 with a pure MacPorts installation (save Django and flup -- both are a svn HEAD).

Apache 2 configuration:

{{{<IfModule mod_fcgid.c>

AddHandler fcgid-script .fcgi


IdleTimeout 600
ProcessLifeTime 3600
MaxProcessCount 8
DefaultMinClassProcessCount 3
DefaultMaxClassProcessCount 3
IPCConnectTimeout 8
IPCCommTimeout 48

</IfModule>

ScriptAlias /myapp "cgi-bin/myapp.fcgi/"}}}

myapp.fcgi:

{{{#!/opt/local/bin/python
# -*- coding: utf-8 -*-
import sys, os

# Add a custom Python path.
sys.path.insert(0, "/opt/local/www")

# Switch to the directory of your project. (Optional.)
os.chdir("/opt/local/www/myapp")

# Set the DJANGO_SETTINGS_MODULE environment variable.
os.environDJANGO_SETTINGS_MODULE? = "myapp.settings"

from django.core.servers.fastcgi import runfastcgi
runfastcgi(["method=threaded", "daemonize=false"])}}}

comment:55 Changed 9 years ago by anonymous

Repost with proper formatting :/

Apache 2 configuration:

<IfModule mod_fcgid.c>
    AddHandler fcgid-script .fcgi
    
    IdleTimeout 600
    ProcessLifeTime 3600
    MaxProcessCount 8
    DefaultMinClassProcessCount 3
    DefaultMaxClassProcessCount 3
    IPCConnectTimeout 8
    IPCCommTimeout 48
</IfModule>

ScriptAlias /myapp "cgi-bin/myapp.fcgi/"

myapp.fcgi:

#!/opt/local/bin/python
# -*- coding: utf-8 -*-
import sys, os

# Add a custom Python path.
sys.path.insert(0, "/opt/local/www")

# Switch to the directory of your project. (Optional.)
os.chdir("/opt/local/www/myapp")

# Set the DJANGO_SETTINGS_MODULE environment variable.
os.environ['DJANGO_SETTINGS_MODULE'] = "myapp.settings"

from django.core.servers.fastcgi import runfastcgi
runfastcgi(["method=threaded", "daemonize=false"])

comment:56 Changed 9 years ago by bahamut@…

  • Cc bahamut@… added

comment:57 Changed 9 years ago by bahamut@…

I get the following error, again with my mod_fcgid setup, when using the admin interface with the progress status monitor js.

Unhandled exception in thread started by <bound method ThreadPool._worker of <flup.server.threadpool.ThreadPool object at 0x715270>>Traceback (most recent call last):  File "/opt/local/www/data/flup/server/threadpool.py", line 103, in _worker    job.run()  File "/opt/local/www/data/flup/server/fcgi_base.py", line 642, in run    self.process_input()  File "/opt/local/www/data/flup/server/fcgi_base.py", line 678, in process_input    self._do_params(rec)  File "/opt/local/www/data/flup/server/fcgi_base.py", line 777, in _do_params    self._start_request(req)  File "/opt/local/www/data/flup/server/fcgi_base.py", line 761, in _start_request    req.run()  File "/opt/local/www/data/flup/server/fcgi_base.py", line 563, in run    self.server.error(self)  File "/opt/local/www/data/flup/server/fcgi_base.py", line 1157, in error    req.stdout.write('Content-Type: text/html\r\n\r\n' +  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/cgitb.py", line 158, in html    dump.append('%s&nbsp;= %s' % (name, pydoc.html.repr(value)))  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 360, in repr    return Repr.repr(self, object)  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/repr.py", line 24, in repr    return self.repr1(x, self.maxlevel)  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 367, in repr1    return self.escape(cram(stripid(repr(x)), self.maxother))  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 65, in __repr__    return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 101, in _get_post    self._load_post_and_files()  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 79, in _load_post_and_files    self._post, self._files = http.parse_file_upload(self)  File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 82, in parse_file_upload    return req.parse_file_upload(req)  File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 83, in parse_streaming_file_upload    upload_state = req.upload_state(req)  File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 126, in __init__    self.state = {'size': int(req.header_dict.get('content-length')),TypeError: int() argument must be a string or a number

comment:58 Changed 9 years ago by bahamut@…

Something ate line breaks, apparently.

Unhandled exception in thread started by <bound method ThreadPool._worker of <flup.server.threadpool.ThreadPool object at 0x715270>>
Traceback (most recent call last):
  File "/opt/local/www/data/flup/server/threadpool.py", line 103, in _worker
    job.run()
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 642, in run
    self.process_input()
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 678, in process_input
    self._do_params(rec)
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 777, in _do_params
    self._start_request(req)
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 761, in _start_request
    req.run()
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 563, in run
    self.server.error(self)
  File "/opt/local/www/data/flup/server/fcgi_base.py", line 1157, in error
    req.stdout.write('Content-Type: text/html\r\n\r\n' +
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/cgitb.py", line 158, in html
    dump.append('%s&nbsp;= %s' % (name, pydoc.html.repr(value)))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 360, in repr
    return Repr.repr(self, object)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/repr.py", line 24, in repr
    return self.repr1(x, self.maxlevel)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 367, in repr1
    return self.escape(cram(stripid(repr(x)), self.maxother))
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 65, in __repr__
    return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 101, in _get_post
    self._load_post_and_files()
  File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 79, in _load_post_and_files
    self._post, self._files = http.parse_file_upload(self)
  File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 82, in parse_file_upload
    return req.parse_file_upload(req)
  File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 83, in parse_streaming_file_upload
    upload_state = req.upload_state(req)
  File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 126, in __init__
    self.state = {'size': int(req.header_dict.get('content-length')),
TypeError: int() argument must be a string or a number

comment:59 Changed 9 years ago by bahamut@…

I have just tested the patch with Apache 2 and mod_python. It works correctly, however I had to make 2 small modifications because my Django application isn't running at the root context of my server.

Basically, the admin JavaScript code hardcodes a request to "/progress/". This needs to be changed appropriately for each install, e.g. in my case "/django/progress/".

Similarly, the UploadStateMiddleware checks that the request path is exactly "/progress/", which needs to be changed to match.

comment:60 Changed 9 years ago by anonymous

I svn up'ed, applied 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id.diff, deleted the Django directory under site-packages, reinstalled Django with 'python setup.py install' and restarted apache/modpython but Django appears to be behaving as before. My memory usage still climbs beyond what I'd expect and I don't see any files in /tmp like I'd expect from tempfile.NamedTemporaryFile().

I added these lines to my settings.py file:

STREAMING_UPLOADS=True

UPLOAD_BUFFER_SIZE=2048

Am I missing something?

Thanks!

comment:61 Changed 9 years ago by oyvind@…

It is 2 middleware now.

django.middleware.upload.UploadStateMiddleware and django.middleware.upload.StreamingUploadMiddleware

Order is important

comment:62 Changed 9 years ago by anonymous

Thanks for the info!

I changed middleware_classes to look like this:

MIDDLEWARE_CLASSES = (
    'django.middleware.common.CommonMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.middleware.doc.XViewMiddleware',
    'django.middleware.upload.UploadStateMiddleware',
    'django.middleware.upload.StreamingUploadMiddleware',
)

I was able to upload a couple large test files and tempfiles were showing up as expected in /tmp, but I'm intermittently receiving this exception when uploading:

Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/base.py" in get_response
  74. response = callback(request, *callback_args, **callback_kwargs)
File "/home/misjxm/dp1/file/app/views.py" in createmessage
  275. if request.POST:
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _get_post
  51. self._load_post_and_files()
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _load_post_and_files
  32. self._post, self._files = http.parse_file_upload(self)
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/http/__init__.py" in parse_file_upload
  81. return req.parse_file_upload(req)
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/middleware/upload.py" in parse_streaming_file_upload
  104. FILES.appendlist(key, FileDict({
File "/usr/lib/python2.4/cgi.py" in __getattr__
  540. raise AttributeError, name

  AttributeError at /file/compose/
  tmp_name

Have you seen this before?

Thanks!

comment:63 Changed 9 years ago by anonymous

And I only seem to get that exception when uploading text files such as M3U playlists, bashrc files, etc.

comment:64 Changed 9 years ago by anonymous

Text files in general only cause the problem intermittently. I have one M3U file that consistantly causes the exception to be thrown. This is what local vars looks like when I upload that file:

name      'tmp_name'
self      FieldStorage('file1', 'klipschtest.m3u', "MP3\\2pac\\Greatest Hits (Disc 1)\\1-05 Hail Mary.mp3\r\nFLAC\\James Blunt\\Back to Bedlam\\03-Wisemen.flac\r\nMP3\\Jimi_Hendrix\\UNKNOWN\\Little wing (extended instrumental).mp3\r\nFLAC\\Led Zeppelin\\Remasters (disc II)\\04-No Quarter.flac\r\nFLAC\\Pink Floyd\\The Wall - Disc 1\\03-Another Brick in the Wall. Part 1.flac\r\nFLAC\\Pink Floyd\\The Wall - Disc 1\\04-The Happiest Days of Our Lives.flac\r\nFLAC\\Natalie Merchant\\Live In Concert\\05-Carnival.flac\r\nMP3\\Metallica\\...And Justice For All\\08-To Live Is To Die.mp3\r\nMP3\\Red Hot Chili Peppers\\Blood Sugar Sex Magik\\10-Blood Sugar Sex Magik.mp3\r\nFLAC\\Sade\\Lovers Rock\\03-King Of Sorrow.flac\r\nFLAC\\Stone, Joss\\The Soul Sessions\\05-Dirty Man.flac\r\nFLAC\\U.G.K\\Ridin' Dirty\\03-Murder.flac\r\nMP3\\Weezer\\Weezer (Green Album)\\04-Weezer _ Island In The Sun.mp3\r\nMP3\\Weezer\\Weezer (Green Album)\\03-Weezer _ Hash Pipe.mp3\r\n")

comment:65 Changed 9 years ago by anonymous

I'm getting an error about 'read only accepts 1 arguement (2 given)", from the following line in cgi.py:

self.fp.read(self.length)

comment:66 Changed 9 years ago by anonymous

This is the error I'm getting:

C:\web\fxedit>python manage.py runserver
Validating models...
0 errors found.

Django version 0.95, using settings 'fxedit.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CTRL-BREAK.
[16/Sep/2006 02:55:31] "GET /file/ HTTP/1.1" 200 1188
[16/Sep/2006 02:55:31] "GET /media/stylesheets/style.css HTTP/1.1" 304 0
[16/Sep/2006 02:55:31] "GET /media/stylesheets/print.css HTTP/1.1" 304 0
Traceback (most recent call last):

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\servers\

basehttp.py", line 272, in run

self.result = application(self.environ, self.start_response)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\servers\

basehttp.py", line 615, in call

return self.application(environ, start_response)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\wsgi.py", line 156, in call

response = self.get_response(request.path, request)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\base.py", line 102, in get_response

return self.get_technical_error_response(request)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\base.py", line 134, in get_technical_error_response

return debug.technical_500_response(request, *sys.exc_info())

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\views\debug.p

y", line 131, in technical_500_response

return HttpResponseServerError(t.render(c), mimetype='text/html')

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 155, in render

return self.nodelist.render(context)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 688, in render

bits.append(self.render_node(node, context))

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 716, in render_node

raise wrapped

TemplateSyntaxError: Caught an exception while rendering: read() takes exactly 1

argument (2 given)

Original Traceback (most recent call last):

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 706, in render_node

result = node.render(context)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\defa

ulttags.py", line 189, in render

value = bool_expr.resolve(context, True)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 548, in resolve

obj = resolve_variable(self.var, context)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in

it.py", line 657, in resolve_variable

raise VariableDoesNotExist, "Failed lookup for key [%s] in %r" % (bits[0], c

urrent) # missing attribute

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\wsgi.py", line 61, in repr

return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\wsgi.py", line 97, in _get_post

self._load_post_and_files()

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers

\wsgi.py", line 75, in _load_post_and_files

self._post, self._files = http.parse_file_upload(self)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\http\init

.py", line 80, in parse_file_upload

return req.parse_file_upload(req)

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\middleware\up

load.py", line 87, in parse_streaming_file_upload

fs = FieldStorage(req.raw_request, environ=req.META, headers=req.header_dict

, upload_state=upload_state )

File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\middleware\up

load.py", line 71, in init

keep_blank_values=keep_blank_values, strict_parsing=strict_parsing)

File "C:\Python24\lib\cgi.py", line 524, in init

self.read_urlencoded()

File "C:\Python24\lib\cgi.py", line 629, in read_urlencoded

qs = self.fp.read(self.length)

TypeError: read() takes exactly 1 argument (2 given)

[16/Sep/2006 02:55:53] "POST /file/stats/ HTTP/1.1" 500 3721

Changed 9 years ago by [Cha0S]

windows temporary file locking fix

comment:67 Changed 9 years ago by SmileyChris

I'm getting the same error as anonymous above (and am also using Windows).

comment:68 Changed 9 years ago by oyvind@…

Idea for fixing the m3u upload problem. May need a copy of this util.py to make it work with wsgi.

But I have almost given up on wsgi for this patch, uless anyone are able to shed som light on why wsgi does not like chunked reading from wsgi.input.

http://svn.apache.org/viewvc/httpd/mod_python/trunk/lib/python/mod_python/util.py?view=markup&pathrev=451861

comment:69 Changed 9 years ago by [Cha0S]

There is a problem using this patch with Nginx (and maybe some other webservers). Nginx doesn't pass content-type and content-length headers and the result of this is that FieldStorage can't parse request correctly and nothing is uploaded. A workaround I've found is modification of cgi.FieldStorage to use HTTP_CONTENT_TYPE and HTTP_CONTENT_LENGTH. But I don't think this is elegant solution of this problem. Does anyone have other suggestions on how to solve it ?

comment:70 Changed 9 years ago by Sung-Jin Hong <serialx.net@…>

Worked for me(Apache mod_python). Working like charm. :) But heres some instructions..

  1. Do the patch (cd django; patch -Np0 < file.diff)
  2. Add two middlewares, modify the model like below:
    class FileList(models.Model):
        name = models.CharField(maxlength=255)
        email = models.EmailField()
    
        class Admin:
            js = ['js/UploadProgress.js']
    
    class AFile(models.Model):
        descr = models.CharField(maxlength=255,core=True)
        file = models.FileField(upload_to='files')
        inlist = models.ForeignKey(FileList,edit_inline=models.STACKED)
    
  1. You have progressbar in admin. :)

Submitting patch to the latest trunk.

comment:71 Changed 9 years ago by Sung-Jin Hong <serialx.net@…>

  • Cc serialx.net@… added

comment:72 Changed 9 years ago by [Cha0S]

There are some issues with this patch. I have it working on my local Linux and Windows boxes. But i can't get to work it completely correct on my server with the same setup as my local linux box (python 2.5, latest Django).
The issue is that when uploading a file WSGIRequest . _load_post_and_files is called only after all data is uploaded. So no streaming occurs and no progress is displayed and I didn't find a solution for this yet..

comment:73 Changed 9 years ago by Øyvind Saltvik <oyvind.saltvik@…>

[Cha0S] using lighttpd or apache, same lighttpd/apache version on both servers?
I found out lighttpd pre 1.4.0 supported streaming, so that has to be fixed before this patch will work on lighttpd.

comment:74 Changed 9 years ago by anonymous

This worked great on my system [Apache/2.0.58 (Win32) mod_python/3.2.8 Python/2.4.3]

Following up on Sung-Jin Hong's good instructions with a few tips for newbies like me:

  1. When you apply the patch, make sure you get django/middleware/upload.py. I found that when I did an SVN update to Rev 4065 and applied 4065-streaming_uploads_and_uploadprogress_middleware_x_progress_id_windowsfix.diff I did not get the file. I had to apply 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id_windowsfix.diff to get it.
  1. Make sure js/UploadProgress.js in accessible. The Admin page's path to it will be in the media directory, e.g. http://yoursite.com/media/js/UploadProgress.js. I had to copy UploadProgress.js from django/contrib/admin/media/js to my local media/js directory.
  1. Make sure the Apache httpd.conf file section for your site allows paths starting with "/progress", i.e. has a line like '<Location ~ "/(admin|shop|progress)">' in it, because the JS code is doing a "req.open("GET", "/progress/", 1)" repeatedly to track progress.

I find that the Upload progress box displays fine in Firefox and Opera, but it is at the very bottom left of the absolute page in IE6 (i.e. you need to scroll to the very bottom to see it). I'm still working on this problem.

Thank you to all who contributed this!

comment:75 Changed 9 years ago by [Cha0S]

[Cha0S] using lighttpd or apache, same lighttpd/apache version on both servers? I found out lighttpd pre 1.4.0 supported streaming, so that has to be fixed before this patch will work on lighttpd.

I've tested with lighttpd and nginx web-servers. same versions on both servers. with apache it works fine on windows.

comment:76 Changed 9 years ago by jacobm3 A T gmail.com

  • Component changed from Core framework to Contrib apps
  • Type changed from enhancement to defect

Has anyone come up with a fix for the AttributeError exceptions when uploading text files? I'm still getting exceptions like this:

AttributeError at /file/compose/
tmp_name
Request Method: POST
Request URL: https://myserver/file/compose/
Exception Type: AttributeError
Exception Value: tmp_name
Exception Location: /usr/lib/python2.4/cgi.py in getattr, line 540

Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/base.py" in get_response

  1. response = callback(request, *callback_args, callback_kwargs)

File "/files/lib/file/app/views.py" in createmessage

  1. if request.POST:

File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _get_post

  1. self._load_post_and_files()

File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _load_post_and_files

  1. self._post, self._files = http.parse_file_upload(self)

File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/http/init.py" in parse_file_upload

  1. return req.parse_file_upload(req)

File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/middleware/upload.py" in parse_streaming_file_upload

  1. FILES.appendlist(key, FileDict({

File "/usr/lib/python2.4/cgi.py" in getattr

  1. raise AttributeError, name

AttributeError at /file/compose/
tmp_name

comment:77 Changed 9 years ago by anonymous

  • Type changed from defect to enhancement

oops, didn't mean to change the type on this ticket... changing back to enhancement.

comment:78 Changed 9 years ago by anonymous

  • Component changed from Contrib apps to Core framework

comment:79 Changed 9 years ago by wonlay@…

  • Cc wonlay@… added

Hi, jacob
The 'AttributeError' occurred, because FieldStorge does not make_temp_file for small files, and the tmp_name is not defined.
The progress middleware has problem too. You will notice that your upload speed will down to a significant lower rate, if you use this middleware. The performance bottleneck is that, every single 'read' from request reads only a few bytes, even if you set a large buffer_size, and the serialization and save to disc of 'UploadState' is significant.

The attachments is my working copy of code.

comment:80 Changed 9 years ago by anonymous

  • Cc lurker86@… added

comment:81 Changed 9 years ago by Simon G. <dev@…>

  • Needs documentation set
  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted

Can someone who has used this give us a progress report on this - which of the 3000 patches above are needed? can these be merged into one patch that works?

comment:82 Changed 9 years ago by anonymous

The one that i can vouch for is 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id.diff

But it has problems with m3u uploads's for some reason i can't figure out.

Was thinking about the possibility of using email.FeedParser instead of rfc822 but that is not available in python 2.3

This patch does not work with lighttpd, since lighttpd can't stream requests.

comment:83 Changed 9 years ago by Ivan Sagalaev <Maniac@…>

This patch does not work with lighttpd, since lighttpd can't stream requests.

This is not true. When Django works with Lighttpd over FastCGI lighttpd immediately offloads received data to Django process and it can stream it wherever it wants to. In fact I use one of my services that uploads zipped mp3 albums under lighttpd and streaming works (with my patch in #1484).

comment:84 Changed 9 years ago by Øyvind Saltvik <oyvind.saltvik@…>

I think you are correct. I guess the problem with this patch and fcgi has to do with not doing blocking reads with a buffer size. But a buffer size is needed if the browser sends a big file without eol's, and for the upload progress.

comment:85 Changed 9 years ago by Ivan Sagalaev <Maniac@…>

I'm not sure I understand what you're saying... I was replying solely to a claim that lighttpd can't do streaming. It certainly does streaming to a FCGI app. Whatever the app does then (streaming by blocks, by eols or no streaming at all) is a whole another question.

comment:86 Changed 9 years ago by shaunc <shaun@…>

  • Cc shaun@… added

comment:87 Changed 9 years ago by Øyvind Saltvik <oyvind.saltvik@…>

Making a attempt at using email.FeedParser

http://fivethreeo.dynalias.org/repos/streaming
login: public, password: public

Trac http://fivethreeo.dynalias.org/trac/browser/streaming

comment:88 Changed 9 years ago by Øyvind Saltvik <oyvind@…>

Need to override FeedParser._parsegen to write the payload directly to a file instead of a in-memory string.

Probably involves copying FeedParser._parsegen and do some modificatioms.

comment:89 Changed 9 years ago by Øyvind Saltvik <oyvind.saltvik@…>

I am pleased to announce, its alive!!

Just some minor work left ( validators, refactoring, bugfixing ) volunteers?

FeedParser should work with python2.3 if it included with django.

comment:90 Changed 9 years ago by anonymous

  • Cc bahamut@… removed

comment:91 Changed 9 years ago by Øyvind Saltvik <oyvind.saltvik@…>

To get the most recent diff, now with validator

svn co http://fivethreeo.dynalias.org/repos/streaming

then

cd streaming

svn diff -r371

Changed 9 years ago by Øyvind Saltvik <oyvind.saltvik@…>

did not work without middleware, fixed

comment:92 Changed 9 years ago by Øyvind Saltvik <oyvind.saltvik@…>

Anyone able to help out on fastcgi testing?

comment:93 Changed 9 years ago by anonymous

  • Cc antonio.mele@… added

will this be added to django 1.0?

comment:94 Changed 9 years ago by Øyvind Saltvik <oyvind.saltvik@…>

Tested on lighttpd/fastcgi without errors or memory hogging.

Should feedparser be included with django so python2.3 users can use this feature too?

The way the feedparser works there is no way to not duplicate the entire _parsegen function as I see it.

comment:95 Changed 9 years ago by anonymous

  • Cc herbert.poul@… added

comment:96 Changed 9 years ago by Joakim Sernbrant <serbaut@…>

Streaming file uploads, take #3

I would really like to see something like this committed to
trunk. However I think the previous patches has been too complicated
and tried to do too much. Lack of documentation and tests has probably
been a factor as well.

I have rewritten the file upload to make it simpler and added some
documentation and tests.

  • Simplified parsing by using a specialized RFC 2388 parser. The

previous patches used different RFC 2822 parsers which were not suited
for the task. The new parser is very efficient both cpu and memory
wise. It uses mx.TextTools for boudary searches if available.

  • Removed AJAX code and progress meter. While I think it is a good

thing I think it hinders the commitment of this patch. We should open
a new ticket for a progress meter in the Admin interface. The
X-Progress-ID header is still supported and a pickled progress file is
written if supplied.

  • Added error handling.
  • Added Documentation.
  • Added tests.

Tested with mod_python and lighttpd.

You need to specify the FILE_UPLOAD_DIR setting to enable streaming (read forms doc).

Patch attached (4459-streaming-file-upload.diff)

Changed 9 years ago by Joakim Sernbrant <serbaut@…>

Simplified streaming uploads

comment:97 Changed 9 years ago by Øyvind Saltvik <oyvind.saltvik@…>

Nice work. Agree with making ajax progressmeter a own ticket.

comment:98 follow-up: Changed 9 years ago by Ivan Sagalaev <Maniac@…>

A couple of comments from quick reading of 4459-streaming-file-upload.diff

  • It streams uploads to files only if FILE_UPLOAD_DIR is set. But it looks that it stores files in temp directory anyway. What's the point of FILE_UPLOAD_DIR then?
  • Instead of FILE_UPLOAD_DIR it would be nice to have a setting like MAX_FILE_UPLOAD_BUFFER with the maximum number of bytes stored in memory during an upload. And only if the file is bigger it is streamed to disc. This will have a nice effect of not touching disc when dealing with small files which is faster
  • It looks like any Exception during the parsing is swallowed and returned as a dict of errors that is then stored in a POST data. This actually breaks the whole exceptions paradigm :-).
  • All the save_FIELD_file now accept raw_field. This is backward-incompatible but worse it limits this method to work only with uploaded files while now save_FIELD_file can accept any content not matter how it was created in a program. I understand that just renaming streamed upload is a fast feature but I believe it deserves a new method. Like 'save_FIELD_from_file' or something...

comment:99 in reply to: ↑ 98 ; follow-up: Changed 9 years ago by Joakim Sernbrant <serbaut@…>

Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:

A couple of comments from quick reading of 4459-streaming-file-upload.diff

  • It streams uploads to files only if FILE_UPLOAD_DIR is set. But it looks that it stores files in temp directory anyway. What's the point of FILE_UPLOAD_DIR then?

A StringIO() buffer is used in that case so no files are used.

  • Instead of FILE_UPLOAD_DIR it would be nice to have a setting like MAX_FILE_UPLOAD_BUFFER with the maximum number of bytes stored in memory during an upload. And only if the file is bigger it is streamed to disc. This will have a nice effect of not touching disc when dealing with small files which is faster

The problem is that you do not know in advance how large a file is. However you know how large the POST is in total so you could enable file streaming based on that. I did not want to create too many settings variables but I agree that it might be a good idea. Another option would be to have a setting where you name the form fields what you want to have streamed.

  • It looks like any Exception during the parsing is swallowed and returned as a dict of errors that is then stored in a POST data. This actually breaks the whole exceptions paradigm :-).

Yes, I did that to avoid too many changes in wsgi.py/modpython.py since you have to take care not to try to read the input stream again in case of an exception. Meaning that I always wanted to return something. Bad choice maybe but I tried to make as few changes as possible so that the powers that be could start looking at the patch without too much confusion ;)

  • All the save_FIELD_file now accept raw_field. This is backward-incompatible but worse it limits this method to work only with uploaded files while now save_FIELD_file can accept any content not matter how it was created in a program. I understand that just renaming streamed upload is a fast feature but I believe it deserves a new method. Like 'save_FIELD_from_file' or something...

Aren't those methods invoked automatically for you when you do .save()? I didnt think they were public but I dont understand exacly how they work so I take your word for it...

Changed 9 years ago by Joakim Sernbrant <serbaut@…>

Added FILE_UPLOAD_MIN_SIZE (default 100kb) to define minimum request size for streaming to disk. Propage exeptions. I'm not too happy with the names of the settings anymore :/

comment:100 Changed 9 years ago by adrian

(I've removed many attachments, as per a request by oyvind.saltvik@… on django-developers.)

comment:101 in reply to: ↑ 99 Changed 9 years ago by Ivan Sagalaev <Maniac@…>

The problem is that you do not know in advance how large a file is. However you know how large the POST is in total so you could enable file streaming based on that.

Yeah that what I meant... The whole point is to not make Django eat everything in memory so a settings topping this amount seems most logical to me.

I saw you made this setting. Here are couple of nits:

file_upload_min_size = getattr(settings, 'FILE_UPLOAD_MIN_SIZE', 100000)

  1. There is a convention in Django to put a setting with its default value into django/conf/global_settings.py. This keeps all defaults in one place and ensures that settings are always present so you don't have to fail-safe them with getattr, just use settings.SETTING_NAME
  1. FILE_UPLOAD_MIN_SIZE sounds confusing, like we don't allow users to upload small files :-). I still like an idea of calling it a buffer size somehow (and actually use it as a buffer size when reading chunks from input). What do you (and everyone) think?
  1. 100 KB is more like 100 * 1024. But I'd make it about 512 * 1024 or 1024 * 1024. This is small enough to not even show on memory stats but big enough to handle most profile photos uploaded in practice without touching disk most of the times.

Yes, I did that to avoid too many changes in wsgi.py/modpython.py since you have to take care not to try to read the input stream again in case of an exception.

Well... An exception ensures just this: the program won't run normally in case of errors, return values on the other hand are easy to forget to check. I see that you have removed swallowing the Exception. Why not MultiPartParserError then? As I understand this will happen only when the input is really malformed (like not created by a browser) so it's very little a user application can do about it. And I think it shouldn't. I think we should catch this exception in handlers/base.py and unconditionally return '400 Bad Request' on it.

Aren't those methods invoked automatically for you when you do .save()? I didnt think they were public but I dont understand exacly how they work so I take your word for it...

They are called from save(), yes. But they are also intended as a public API and documented (it's in model's API doc, too lazy to get a link :-) ).

comment:102 follow-up: Changed 9 years ago by Gábor Farkas <gabor@…>

  • Cc gabor@… added

comment:103 Changed 8 years ago by David Cramer <dcramer@…>

  • Cc dcramer@… added

comment:104 Changed 8 years ago by Nebojša Đorđević <nesh@…>

  • Cc nesh@… added; nesh@… removed

comment:105 in reply to: ↑ 102 Changed 8 years ago by michell.jo@…

Replying to Gábor Farkas <gabor@nekomancer.net>:

Sorry to clutter the thread.

Can this patch be used with the newforms framework, or does it only work with oldforms?

comment:106 Changed 8 years ago by Øyvind Saltvik <oyvind.saltvik@…>

Can be used in newforms if you handle the incoming data, form_for_model has no automatic filefields yet.

comment:107 in reply to: ↑ 11 Changed 8 years ago by michell.jo@…

Replying to adrian:

In reference to bnewbold's comment above, it's the shutil module, and it looks like it's been around since at least Python 2.3. (Django supports Python 2.3 and up.)

os.rename('old', 'new') does result in an 'invalid cross-link device' exception when 'old' and 'new' are on different filesystems. It doesn't make sense to do this, as once a large file has been streamed to disk, copying to another filesystem is a pointless additional overhead. However, the code could catch the error, and do a copy and then delete operation (and generate a warning that the system can be configured more efficiently).

Or it could use shutil as suggested by a previous poster.

comment:108 Changed 8 years ago by michell.jo@…

Is it possible that this patch could be updated so that it works with the current source version? At the moment if one checks out revision 4459, patches it and then executes svn update, clashes arise at several points in the code.

comment:109 Changed 8 years ago by Brian Koebbe <koebbe@…>

  • Cc koebbe@… added

Changed 8 years ago by axiak@…

I've updated the patch to include the shutils command and to work with [5065]. Please check to see if it works.

Changed 8 years ago by axiak@…

Works in [5065], renamed settings variable, uses global settings, defaults STREAMING_MIN_POST_SIZE to .5MB (please test though!)

comment:110 Changed 8 years ago by Michael Axiak <axiak@…>

  • Cc axiak@… added

Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Updated to trunk, without changes

comment:111 follow-up: Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Axiak

But does this not limit streaming to gnu os'es by using shutils. Should it not try shutils first, dont import shutils at the top of the .py Then try rename, and finally just write from filecontent? if that fails.

Changed 8 years ago by Michael Axiak <axiak@…>

Usese shutils, but falls back.

Changed 8 years ago by Michael Axiak <axiak@…>

Usese shutils, but falls back, this time it deletes the tmp file even when it falls back.

comment:112 in reply to: ↑ 111 Changed 8 years ago by Michael Axiak <axiak@…>

Replying to Øyvind Saltvik <oyvind@saltvik.no>:

Axiak

But does this not limit streaming to gnu os'es by using shutils. Should it not try shutils first, dont import shutils at the top of the .py Then try rename, and finally just write from filecontent? if that fails.

I did not know it only worked in GNU os's. Also, AFAIK, there's no OS-independent way to copy files in python, and if file streaming was enabled, file_fieldcontent? will not have the desired content. Thus, the only sure-fire way to make it work is to do a file.write file.read combination. Luckily it probably won't have to do this, since it will try both shutils and os.rename before it gets to that point.

comment:113 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

One more thing, the last fallback should read and write incrementally, to avoid reading the whole file before it is written to another.

Changed 8 years ago by Michael Axiak <axiak@…>

This file uses python fallbacks, but using chunks to avoid loading the entire file into memory.

Changed 8 years ago by Michael Axiak <axiak@…>

Cleaned it up a bit. Moved file_move_safe into django.utils in case it should be used in future endeavors.

comment:114 Changed 8 years ago by Michael Axiak <axiak@…>

  • Needs documentation unset
  • Summary changed from [patch] Large streaming uploads to Large streaming uploads
  • Version set to SVN

This last patch seems clean. If people wanted to start testing, now would be a good time :). It works in [5079] and should work on all platforms while considering all the ideas posted here.
It also has documentation. I will not remove the `Patch needs improvement' flag to let others review this patch with a finer comb than I have.

Please post here if you have additional concerns, comments, or enhancements.

comment:115 Changed 8 years ago by Joakim Sernbrant <serbaut@…>

The raise at http/init__.py:114 looks broken, else: missing.

comment:116 Changed 8 years ago by anonymous

How to use x-progress-bar ??

Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Added missing else

Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Forgot to add django/utils/file.py

comment:117 follow-up: Changed 8 years ago by anonymous

Hi,

Is x-progress-bar is a header that come with apache only or it is a start header?

Is it work on fcgi, if not, can I change x-progress-bar to hidden form?

Thank.

comment:118 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

It is just a header that has to be sent along with the request.

X-Progress-ID with a value

then FILE_UPLOAD_DIR + the value of the header X-Progress-ID is a pickled python object.

That can be used by any view and displayed using javascript.

comment:119 Changed 8 years ago by anonymous

Thank for your replying,

I have tested this code by modified old javascript (middleware enable). I never seen X-Progress-ID in header, so, a temp file with that id never created.

Can I have example usage?

Thank again

comment:120 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Actually, this patch does this wrong, the header is only used for getting progress status, the id should be the querystring.

comment:121 Changed 8 years ago by Michael Axiak <axiak@…>

As this site states:
http://search.cpan.org/~ceeshek/Apache2-UploadProgress-0.2/lib/Apache2/UploadProgress.pm#HANDLERS

We should be using multiple mechanisms to get this progress ID.
After I mull over this some more and understand it better, I'll make a new patch.

Changed 8 years ago by Michael Axiak <axiak@…>

Uses multiple mechanisms for determining the progress id.

comment:122 Changed 8 years ago by Michael Axiak <axiak@…>

The patch has now been merged into [5099] and I added a new property to request: progress_id. This may simplify JavaScript Middleware (we should make a new ticket for that) as it only has to do request.progress_id. If it's None, there was no progress ID sent. Otherwise, there was.

comment:123 Changed 8 years ago by Michael Axiak <axiak@…>

This ticket is only for making sure file uploads work. I have created ticket #4165 which s specifically for the progress bar enhancements.

comment:124 in reply to: ↑ 117 Changed 8 years ago by anonymous

Replying to anonymous:

Hi,

Is x-progress-bar is a header that come with apache only or it is a start header?

Is it work on fcgi, if not, can I change x-progress-bar to hidden form?

Thank.

X-Progress-ID is used to tell the server between loads "This is the same post". I'm actually not sure how the headers are supposed to be populated, only that we're supposed to "support them". After reading a perl module for quite some time, I've determined that they use javascript to append a random 32-digit hex at the end of their form action ("?progress_id=abcdef0123456789abcdef0123456789") right before the form submits.

If you do this, it will work.

NB: Using a hidden POST field will NOT work. Why? Because it's the POST upload we're trying to track!

Changed 8 years ago by Michael Axiak <axiak@…>

This isn't important enough to get its own property in request...request.META now contains 'UPLOAD_PROGRESS_ID'

Changed 8 years ago by Michael Axiak <axiak@…>

I included a bit much in that last patch.

Changed 8 years ago by Michael Axiak <axiak@…>

Meant to be cleaner, especially in light of #4165

Changed 8 years ago by Michael Axiak <axiak@…>

Added middleware hooks

Changed 8 years ago by Michael Axiak <axiak@…>

Added middleware hooks...this is better.

comment:125 in reply to: ↑ 27 Changed 8 years ago by anonymous

Replying to Maniac <Maniac@SoftwareManiacs.Org>:

My idea of a hook is not fit into current middleware, I just wanted to retain the style. So to clarify I see it like this.

This can be a separate set of hooks similar to MIDDLEWARE_CLASSES or it can be done in middleware classes themselves. To do the latter we will specifiy that in addition to process_request, _response, _view and _exception methods a middleware can define a new kind of method: process_upload(request, data).

parse_file_upload then would search for these middleware methods just as http requests do in modpython.py and wsgi.py. The list of methods is passed to a FieldStorage decendant that will call them each time it's about to write a chunk of data to the target stream.

This is it... Sorry I can't come with the actual code: pretty busy rolling out my current project.

After reviewing all the comments, this one struck me in particular. I have added a process_upload(UploadException) method to MiddleWare.
I'm sure I might get heat for this recent addition, but I think it's the cleanest way to allow extensibility.

Essentially, you can create a descriptor
for the file_progress attribute of request. It gets treated like a dictionary so it will be passed things like {'received': 1414,'size': 100000} and is expected to return it when its asked for. You are given an exception to raise if something should cancel the file upload, so I could imagine someone writing a descriptor that would compare received and some target size and cancel the upload if it's more than that.

For more details, please look at django/http/file_descriptor.py for the fallback descriptor. If you named that exact class process_upload and stuck it in a middleware, it would be used (and would work). I'll come up with another example soon, and stick it in #4165.

Changed 8 years ago by Michael Axiak <axiak@…>

Apparently I don't know how not to lose files.

Changed 8 years ago by Michael Axiak <axiak@…>

Apparently I don't know how not to lose files.

Changed 8 years ago by Michael Axiak <axiak@…>

Fixes WSGI bug.

Changed 8 years ago by Michael Axiak <axiak@…>

Fixed bug where .append(0, func) was called.

Changed 8 years ago by Michael Axiak <axiak@…>

Fixed bug where .append(0, func) was called with the files this time.

comment:126 Changed 8 years ago by anonymous

Break!

Traceback (most recent call last):

File "C:\Python25\lib\site-packages\django\core\servers\basehttp.py", line 272, in run

self.result = application(self.environ, self.start_response)

File "C:\Python25\lib\site-packages\django\core\servers\basehttp.py", line 614, in call

return self.application(environ, start_response)

File "C:\Python25\lib\site-packages\django\core\handlers\wsgi.py", line 207, in call

request = WSGIRequest(environ)

File "C:\Python25\lib\site-packages\django\core\handlers\wsgi.py", line 78, in init

self.METAUPLOAD_PROGRESS_ID? = self._get_file_progress_id()

File "C:\Python25\lib\site-packages\django\core\handlers\wsgi.py", line 187, in _get_file_progress_id

self._req.args)

AttributeError: 'WSGIRequest' object has no attribute '_req'

Changed 8 years ago by Michael Axiak <axiak@…>

Tested with WSGI and made a few changes.

comment:127 follow-up: Changed 8 years ago by Michael Axiak <axiak@…>

Alright, I tested/tweaked the patch on a few different platforms. Here are my notes:

  • Apache2/mod_python: Works well. Works with #4165 and generates pretty progress bar
  • Apache2/mod_fastcgi: Works well. Works with #4165 and generates pretty progress bar
  • Apache2/mod_scgi: Untested.
  • LighTTPD/mod_fastcgi: Does NOT work. However, version 1.5 will have mod_uploadprogress which will provide a JSON interface just like the one in #4165. Thus, using mod_upgradeprogress to generate a progress bar should be a breeze. I don't have lighttpd 1.5 to test, though.
  • manage.py development server: World well. Progress bar does not work AFAIK, the development server is singlethreaded and it will stay that way. The progress bar is an inherently multithreaded process (one to serve progress information, one to receive the uploaded information).

Let me know if you have anymore problems.

Changed 8 years ago by Michael Axiak <axiak@…>

Added 'state':'starting' to be more like mod_uploadprogress.

comment:128 follow-up: Changed 8 years ago by anonymous

Hi,

I have tested under window xp, apache 2.2.4, mod_python 3.3. When I upload a large file via admin interface.

A temp file with named by session generated by javascript client is created. However, a file with random name and '.upload' extension is also created. File stream is append to '.upload' version instead of session id one.

Any idea??

comment:129 in reply to: ↑ 128 Changed 8 years ago by Michael Axiak <axiak@…>

Replying to anonymous:

Hi,

I have tested under window xp, apache 2.2.4, mod_python 3.3. When I upload a large file via admin interface.

A temp file with named by session generated by javascript client is created. However, a file with random name and '.upload' extension is also created. File stream is append to '.upload' version instead of session id one.

Any idea??

Yes, this is how it is supposed to work, which is why there are hooks for middleware (see illustration middleware in #4165).

You see, we don't need to keep track of the actual file that's being uploaded, because it's always being uploaded (by the same thread, same loop, etc.). We do need to be able to look for the 'progress' of that file upload, which has to be stored in some persistent manner, keyed by the session id. Since the file upload code has no model, can't rely on cache, and does have a safe place to put temp files, it by default puts that state information in that location.

If you install the patch 5126_file_upload_contrib_app.diff and add 'django.contrib.uploadprogress.middleware.FileProgressCached' to your MIDDLEWARE_CLASSES setting, the cache framework will be used, and no session_id file will be created.

Alternatively, you can use that patch and add 'django.contrib.uploadprogress.middleware.FileProgressDB' to your MIDDLEWARE_CLASSES and add 'django.contrib.uploadprogress' to your INSTALLED_APPS (and run a syncdb...).

Or, you are welcome to look at the code in django.contrib.uploadprogress.middleware.FileProgressCached and write your own method of storing this file progress information (it's in a dictionary format).

Sounds like it's working though, thanks for the feedback.

comment:130 Changed 8 years ago by Michael Axiak <axiak@…>

I should also explain the rationale for the naming of the files:

We use the X-Progress-ID value for the name of the file that stores the progress because in subsequent requests we're going to need to get the file by that same value.

We use a (python) temporary file name for the actual file that's being uploaded because it guarantees uniqueness. Since X-Progress-ID is provided by the client, we can't guarantee someone won't start auto-generating requests and cause a collision on the files themselves. I have thought of putting the X-Progress-ID in the session. This can be done but it relies on the session app -- which is not a prereq for Django to work -- being activated and file uploading is a core feature.

It doesn't seem like too much of an issue if the progress tracking has a small chance of getting munged. The actual files, OTOH, would be a bit worrisome.

That being said, if you really don't like the naming, and have a better suggestion, by all means suggest it.

comment:131 in reply to: ↑ 127 ; follow-up: Changed 8 years ago by anonymous

Replying to Michael Axiak <axiak@mit.edu>:

Alright, I tested/tweaked the patch on a few different platforms. Here are my notes:
...

My new report:

  • Apache2/mod_python: Works well. Works with #4165 and generates pretty progress bar
  • Apache2/mod_fastcgi: Works well. Works with #4165 and generates pretty progress bar
  • Apache2/mod_scgi: Untested.
  • LighTTPD/mod_fastcgi: I stand corrected (Thanks SmileyChris). It does help with LighTTPD inasmuch as it will make the interaction between lighttpd and python not eat all the memory and CPU. However, it will not show a progress bar (nor are other features in #4165 supported), as the streamed 'upload' only exists between lighttpd and python, not between the web client and the web server.
  • manage.py development server: World well. Progress bar does not work AFAIK, the development server is singlethreaded and it will stay that way. The progress bar is an inherently multithreaded process (one to serve progress information, one to receive the uploaded information).

Also, when uploading a 4.5 GB file, the content length was reported incorrectly. This happened with both FastCGI and mod_python on apache2.

comment:132 in reply to: ↑ 131 Changed 8 years ago by anonymous

Replying to anonymous:

...

Also, when uploading a 4.5 GB file, the content length was reported incorrectly. This happened with both FastCGI and mod_python on apache2.

It turns out this happens when you don't have Large File support (on either the client or the server?).

Changed 8 years ago by SmileyChris

There was an error uploading large files with streaming turned off

comment:133 Changed 8 years ago by SmileyChris

I also tidied up the code a bit (just white space tweaking mostly).

I noticed that currently, you need to set an upload directory even if you are using a middleware which overrides the FileProgressDescriptor of request because of calls like this in http.__init__:

            if self._progress_filename:
                self._request.file_progress = {'state': 'done'}

I'm not sure if relying on _progress_filename is the best way to do this.

comment:134 Changed 8 years ago by mtredinnick

Can we have a moratorium on uploading yet more changes to this ticket for a few days, please?

I've started reading through a few of the patches carefully with an eye to working out what parts look like they're ready to be committed. However, this isn't going to work if every day there is yet another slight tweak to something. It's going to take a few days to review all this carefully and think about all the issues raised in the comments, so everybody just take a deep breath and back away from the keyboards for a bit.

comment:135 follow-up: Changed 8 years ago by lukeman@…

I've yet to test large files with the patch applied, but I've noticed my files are being created with 600 permissions instead of the 644 they received before patching. Not sure if it's just my setup, but I can't recall any other changes over the past day or so that would have affected it.

comment:136 in reply to: ↑ 135 Changed 8 years ago by Michael Axiak <axiak@…>

Replying to lukeman@gmail.com:

I've yet to test large files with the patch applied, but I've noticed my files are being created with 600 permissions instead of the 644 they received before patching. Not sure if it's just my setup, but I can't recall any other changes over the past day or so that would have affected it.

It should inherit the permissions from the temporary directory. I can't think of a better setup than this, but if people think this is unexpected/problematic maybe we can change it.

If people understand streaming files go to the temp folder then to the other folder, then I think inheriting the permissions from the temp folder would be expected behavior.

-Mike

comment:137 follow-up: Changed 8 years ago by anonymous

I have tested multiple upload at the same time (difference browser instance). If there are more than two upload request, when a first request
is sent, it work properly. If I make a new call, an old stop immediately and a new one not create a session and temp file anymore.

Django is great in advance, please keep it simple and work in common task.

comment:138 in reply to: ↑ 137 Changed 8 years ago by Michael Axiak <axiak@…>

Replying to anonymous:

I have tested multiple upload at the same time (difference browser instance). If there are more than two upload request, when a first request
is sent, it work properly. If I make a new call, an old stop immediately and a new one not create a session and temp file anymore.

Django is great in advance, please keep it simple and work in common task.

Hi,

I have tested it with simultaneous file uploads and it worked fine. Could you please include more details? For instance, I don't think there would be any chance it would work in the development server. It may break in some FastCGI configurations, too. (The reason being that accepting multiple file uploads requires some multithreadedness.)

-Mike

comment:139 Changed 8 years ago by anonymous

  • Cc rudolph.froger@… added

comment:140 follow-up: Changed 8 years ago by anonymous

Hi

If I stop upload process by clicking a "Stop" button. It look like sever see a same upload request and never generate process id.

How to solve this problem?

Thank

comment:141 Changed 8 years ago by anonymous

  • Cc jm.bugtracking@… added

comment:142 in reply to: ↑ 140 Changed 8 years ago by Michael Axiak <axiak@…>

Replying to anonymous:

Hi

If I stop upload process by clicking a "Stop" button. It look like sever see a same upload request and never generate process id.

How to solve this problem?

Thank

I'm not sure why you'd be seeing this. If you're using #4165 then the javascript should create a new progress id on *every* upload. If you're not using #4165, then progress id is irrelevant. While testing I was able to stop and re-upload in FF 2, 1.5, IE 7, and IE 6 without a problem. Could you please describe your environment with more detail?

-Mike

Changed 8 years ago by Michael Axiak <axiak@…>

Removed some unneeded things. No file progress tracking by default.

comment:143 follow-up: Changed 8 years ago by Michael Axiak <axiak@…>

I've cleaned the FileProgressDescriptor architecture a little bit. Three changes for this patch:

  • using request.__class__ = descriptor rather than using a weird descriptor wrapper.
  • defaulting to a NullFileProgressDescriptor (which does nothing) versus one that stored the information in temp upload directory.
  • making safe_file_move do nothing if the source and destination are the same string.

If you want to track file upload progress you need to install the stuff from #4165. This ticket just contains the hooks to get #4165 there with just a minimal of contrib. This patch makes that even clearer by removing the last remnants of functional file progress tracking.

I *think* this is probably where we want to leave this ticket (in terms of what should be cut out of it, what should be put in). Although, I can see several other separation (take care of form stuff in another ticket, for instance).

It is worth noting (because someone had issue with this in IRC), that if you try to use this save_FIELD_file(name, content) should now be given a request.FILE item, rather than the actual content of the file.

For example:

    object.save_video_file(request.FILE['video']['filename'], request.FILE['video']['content'])

is bad, since the mere invocation request.FILE['video']['content'] will load all the content into memory, exactly what we're trying to avoid.

The new version looks like:

    object.save_video_file(request.FILE['video']['filename'], request.FILE['video'])

In reply to SmileyChris, we have to have an upload directory anyway, since the FileProgressDescriptor only changes how file *progress* is tracked, not the actual uploading of files (that is, it has to go into a directory, or else this entire thing is disabled). We could rename and clean up if people prefer it changed.

comment:144 in reply to: ↑ 143 ; follow-up: Changed 8 years ago by anonymous

Replying to Michael Axiak <axiak@mit.edu>:

The new version looks like:

    object.save_video_file(request.FILE['video']['filename'], request.FILE['video'])

Why not skip request.FILEvideo?filename? at all? This is provided by request.FILEvideo? anyway.
IMHO having filename as an separate parameter makes only sense if this would be used somehow directly, but currently the filename/path where the file gets saved is changed by the field (field.get_directory_name() and field.get_filename(filename)).

comment:145 in reply to: ↑ 144 Changed 8 years ago by Michael Axiak <axiak@…>

Replying to anonymous:

Why not skip request.FILEvideo?filename? at all? This is provided by request.FILEvideo? anyway.
IMHO having filename as an separate parameter makes only sense if this would be used somehow directly, but currently the filename/path where the file gets saved is changed by the field (field.get_directory_name() and field.get_filename(filename)).

Well, save_video_file() is meant to allow someone to save a file. Unless that file is specifically provided in a post (it's not necessarily, and in many of my applications it's not...generated images come to mind).

We *can* have it accept both sets of arguments, but then it's too nonorthogonal. Perhaps we should just create a new one? move_video_file()??

Changed 8 years ago by Michael Axiak <axiak@…>

Newer, cleaner version of file upload script

Changed 8 years ago by Michael Axiak <axiak@…>

Newer, cleaner version of file upload script (doh! no random commented assert)

Changed 8 years ago by Michael Axiak <axiak@…>

Sorry about that -- this one is the better one.

comment:146 Changed 8 years ago by Michael Axiak <axiak@…>

A list of changes in this new version:

  • I removed the file_progress descriptor handling and middleware -- if you need to track file progress, you can use process_request to achieve what you need.
  • I cleaned up a lot of things, removed many catch-all excepts and replaced with except OSError etc.
  • Included cross-platform locking for saving/moving files manually (django/utils/file_locks.py)
  • Made move_file_safe(src,dest) to care if a file already exists and throw an error if one does by default.
  • Moved stuff out of http/__init__.py into various locations.
  • In reply to SmileyChris' comment earlier, no longer uses self._progress_file to determine if a file is being tracked. Now using self._track_progress (a boolean)
  • save_field_file can now optionally not take a filename (if filename = None). To avoid confusion, there is an additional function, instance.move_field_file which takes one parameter: the request.FILES value. Code examples should use move_field_file to distinguish this new functionality.
  • Stay tuned for updates to #4165 to reflect these and other changes!

Example of new move_video_file Model instance function:

#from this
foo.save_video_file(request.FILES['video']['filename'], request.FILES['video'])

#to this
foo.move_video_file(request.FILES['video'])

Changed 8 years ago by Michael Axiak <axiak@…>

It's amazing what Trac helps you see.

Changed 8 years ago by Michael Axiak <axiak@…>

Made a subtle fix after testing with #4165.

comment:147 follow-up: Changed 8 years ago by anonymous

def init(self, headers, input, request, file_upload_dir=None, streaming_min_post_size=None, chunk_size=1024*64):

parser = MultiPartParser(headers, input, request, file_upload_dir, streaming_min_post_size)

Is chunk_size is configurable? Do you hard code, aren't you.

Thank

comment:148 in reply to: ↑ 147 Changed 8 years ago by Michael Axiak <axiak@…>

Replying to anonymous:

def init(self, headers, input, request, file_upload_dir=None, streaming_min_post_size=None, chunk_size=1024*64):

parser = MultiPartParser(headers, input, request, file_upload_dir, streaming_min_post_size)

Is chunk_size is configurable? Do you hard code, aren't you.

Thank

Actually, that's a great point. I am going to change a few things (shift around save_FIELD_file and move_FIELD_file to make things more explicit) and on my list for the next upload would be to allow chunk sizes to be a settings variable. (I was actually discussing this with Øyvind in IRC, http://simon.bofh.ms/logger/django/2007/05/26/15/25 .)

Thanks for the comment, though. Feel free to post any other suggestions!

comment:149 Changed 8 years ago by bricetebbs@…

Works for me. Windows/Apache2.2.4/ModPython.
My page actually uploads 4 large files at once so I added 'filename' : self._filename to the file_progress dict created in multparser.py then I can display the current file in my javascript.

comment:150 Changed 8 years ago by anonymous

  • Cc oliver@… added

comment:151 Changed 8 years ago by djangoproject.com@…

5343_cleaned_streaming_file_upload_tweaked.diff works perfect for me (Linux/Apache 2.0.59/modPython). Uploading large zip files (10 to 100 MB each) to my photo gallery with no problems ... memory usage is dead straight at 65MB ... where as it explored to 600 MB per apache process before. Good work Michael!

comment:152 Changed 8 years ago by anonymous

111 def parse(self):
112 try:
113 self._parse()
114 finally:
115 if self._track_progress:
116 self._request.file_progress = {'state': 'done'}
117 return self._post, self._files

When user click 'stop' while uploading file, server return {'state':'done'} even file is not completely uploaded.
This cause unexpected behaviors under different browser. I think we should return actual file size and received size to make
client to be able to check if file is completely uploaded.

Thank

comment:153 Changed 8 years ago by anonymous

def parse(self):

try:

self._parse()

finally:

if self._progress_filename:

if self._received == self._size:

self._request.file_progress = {'received':self._received,

'size':self._size,
'state': 'done'}

else:

self._request.file_progress = {'received':self._received,

'size':self._size,
'state': 'abort'}

How about this?

comment:154 Changed 8 years ago by anonymous

  • Cc moritz.angermann@… added

Changed 8 years ago by klaus.blindert@…

Fixed microscopic bug in an error path

comment:155 Changed 8 years ago by simonbun <simonbun@…>

  • Cc django@… added

comment:156 Changed 8 years ago by anonymous

  • Cc paltman@… added

Changed 8 years ago by simonbun <simonbun@…>

Updated patch against r5722

Changed 8 years ago by simonbun <simonbun@…>

Sorry, that last one patch also included #4165. This is the correct one.

Changed 8 years ago by Brian Rosner <brosner@…>

updated to r5724 (previous patch was missing files too)

comment:157 Changed 8 years ago by Brian Rosner <brosner@…>

  • Cc brosner@… added

comment:158 Changed 8 years ago by Simon G. <dev@…>

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

Reading over this, I think the latest version here looks good (5724_streaming_file_upload.diff). Is everyone happy with it?

comment:159 Changed 8 years ago by simonbun <simonbun@…>

Yes, I've been using it intensively for the last couple of days and it works beautifully.

comment:160 Changed 8 years ago by Brian Rosner <brosner@…>

I have been using this patch for a little while now and it does work very well. Haven't seen anything wrong with it.

comment:161 Changed 8 years ago by eibaan@…

I manually applied patch 5724... to the latest trunk version and noticed that there's a syntax error in line 254 of multipartparser.py. Instead of

except OSError, IOError:

it must be

except (OSError, IOError), e:

Besides that, the patch seems to work quite nicely and I hope it will be added to trunk ASAP.

comment:162 follow-up: Changed 8 years ago by simonbun <simonbun@…>

Ran into a very small issue:
Files uploads that are streamed end up having 0600 permissions, while files that are uploaded normally have 0644 permissions. I can't put my finger on where this happens at a glance, will look into it tomorrow.

(In case anyone's wondering, I need read access to be able to scan newly uploaded files with clamav.)

comment:163 in reply to: ↑ 162 ; follow-up: Changed 8 years ago by anonymous

Please read 136. This is standard Unix permissions. For example, if someone (heaven forbid) used PHP to upload, the files would inherit the permissions from the temporary upload directory.

-Mike

Replying to simonbun <simonbun@versea.be>:

Ran into a very small issue:
Files uploads that are streamed end up having 0600 permissions, while files that are uploaded normally have 0644 permissions. I can't put my finger on where this happens at a glance, will look into it tomorrow.

(In case anyone's wondering, I need read access to be able to scan newly uploaded files with clamav.)

comment:164 Changed 8 years ago by simonbun <simonbun@…>

Well the strange thing is my temporary dir and my uploads dir both have 777 permissions. I just checked again and the temporary .upload file has 600 permissions.

I'm using apache2 + mod_python + python2.5 if that would make a difference

comment:165 Changed 8 years ago by anonymous

  • Cc matthias@… added

comment:166 in reply to: ↑ 163 Changed 8 years ago by Arthur Debert

Replying to anonymous:

I'll second that. This one has bitten me too.
While setting permissions to 600 is sound, maybe a two liner on the documentation would help make that more obvious. It took me a while to understand what as going on.

Asides from that, thanks for the patch. I've been using it for a while (mod_python at webfaction's setup, post unicode merge) and it works well.

Cheers
Arthur

comment:167 Changed 8 years ago by anonymous

  • Cc root.lastnode@… added

comment:168 Changed 8 years ago by akaihola <antti.kaihola@…>

  • Cc akaihola+djtick@… added

Changed 8 years ago by Brian Rosner <brosner@…>

updated to r5820

comment:169 Changed 8 years ago by anonymous

  • Cc sam@… added

comment:170 Changed 8 years ago by Trey <trey@…>

  • Cc trey@… added

comment:171 Changed 8 years ago by anonymous

  • Cc klaus.blindert@… added

comment:172 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Patch for r5820 does not work with forms.FileField and forms.ImageField, raw_field is now a StrAndUnicode object not a dict containing the raw data.

comment:173 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

comment:174 Changed 8 years ago by axiak

  • Keywords sprint added
  • Owner changed from nobody to axiak
  • Status changed from assigned to new
  • Triage Stage changed from Ready for checkin to Accepted

I will take this on to as full extent as I can over the upcoming sprint.
If anyone has any more issues, please post them on here before Friday.

comment:175 Changed 8 years ago by anonymous

I would like to see the filename be part of the progress returned for when a form has multiple files:

self._request.file_progress = {'received': self._received,
                                'size':     self._size,
                                'filename' : self._filename,
                                'state':    'uploading'}

comment:176 Changed 8 years ago by Matthias Urlichs <smurf@…>

The checks after an image upload (as seen in http://dpaste.com/hold/18468/) can be improved somewhat; at first glance it is entirely unclear what's supposed to happen, and the error message is repeated.

See http://dpaste.com/hold/19212/ (untested but looks OK).

comment:177 Changed 8 years ago by klaus.blindert@…

Uploading empty files is treated as an error, why?
I would also like to see a file propgress tracker example implementation with unit tests and a stress test for uploading
large files.

On request I can provide an example of such a stress test programm.

comment:178 follow-up: Changed 8 years ago by Matthias Urlichs <smurf@…>

I frankly fail to see the use case for uploading an empty file (as opposed to simply not uploading it in the first place).

If you have one, feel free to (a) tell us why you think it makes sense, (b) provide a patch which optionally allows that.

comment:179 in reply to: ↑ 178 Changed 8 years ago by anonymous

Replying to Matthias Urlichs <smurf@smurf.noris.de>:

I frankly fail to see the use case for uploading an empty file (as opposed to simply not uploading it in the first place).

There are several situations where one might want to upload empty files.

  1. It could be that one does not know the file is empty
  2. Maybe one actually needs to have an empty file, for example a ini.py in python indicates that a directory is a package

There is nothing special about an empty file, it is a file like any other file, it only contains a single character, the End Of File character. Why would you want to treat this file in any different way? If you do so it only means that your code is not generic enough.

One more note:

The reason that this ticket is languishing like this is that it is filled with tacit assumptions, unusual behavior (like the one I'm commenting on) and extra functionality that is not relevant to the actual problem it needs to solve first - streaming the uploads.

comment:180 follow-up: Changed 8 years ago by Matthias Urlichs <smurf@…>

I'd argue that case 1 is a prime candidate for keeping the test, while case 2 can be handled by uploading a file with a comment in it.

In any case: if you think that's the reason, then please post either a list of (other) problems you have with this patch, or an improved patch.

Anythig else is not helpful. IMHO.

comment:181 Changed 8 years ago by PhiR

  • Keywords sprintsept14 feature added; sprint removed

comment:182 Changed 8 years ago by Uninen

  • Cc ville@… added

comment:183 in reply to: ↑ 180 Changed 8 years ago by anonymous

Replying to Matthias Urlichs <smurf@smurf.noris.de>:

while case 2 can be handled by uploading a file with a comment in it.

I would be willing to elaborate but this commment of yours indicates to me you have your mind set and don't actually want to understand the point that I was making: That is sometimes a file have to be empty.

You can't manually start adding bytes into files just because someone decided that uploading an empty file is an error. All you need is think about it for a second. As I said before an empty file is still a valid file, it has a name, a modification date ... even a size!. Maybe it is used to indicate a state, maybe it is log file that has yet to be written to etc. there are countless usecases that require the presence of a file that may not contain anything yet.

In your design if one were to simultaneously upload 10 files one of which is empty, the upload should fail once processing reaches the empty file ...

comment:184 Changed 8 years ago by Matthias Urlichs

I do admit that you have a point; I admit that my response was a bit knee-jerk.

My core point, now that I think about it, is not that I'm dead set against uploading empty files. That test is easily removed, or made optional.

The real point is: What's holding up the change? Did axiak just not have enough time?

This is not the only patch in this bug tracker which fixes a real-world problem. We're carrying ~30 of these in our own repository and, frankly, I'd like that number to go down, one of these days. :-/

comment:185 Changed 8 years ago by oliver@…

I know this is maybe a little late to chime in, but as opinion on this seems to be heavily divided as to whether empty files should be allowed, would it not just be best all-round to add a way to define whether empty files should be allowed or not?

Surely this wouldn't be too hard (I'd be willing to work on this if I get a few moments here-and-there).

Just a thought'''

comment:186 Changed 8 years ago by Trey <trey@…>

I am going to have to go with +1 for not distinguishing between an empty file or a non-empty file.

Technically there is no difference between and empty and non-empty file so I don't see why django should distinguish between the two. It's definitely a user defined preference at that point, which should be possible after the file has been uploaded. Not wanting and empty file is an assumption, a popular one yes, but still an assumption about the upload.

comment:187 Changed 8 years ago by Michael Axiak <axiak@…>

Hey,

As well as Trey, I am also +1 for not making this distinction. If people need the validation, it belongs in another layer than the core http processing. (Perhaps the newforms filefield)

-Mike

comment:188 Changed 8 years ago by serialx

I too +1 for not making that distinction.

-SJ

comment:189 Changed 8 years ago by Trey <trey@…>

What do we need to do to get some movement on this patch?

comment:190 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Make it work with current trunk first

comment:191 Changed 8 years ago by remohammadi

  • Cc reza@… added

comment:192 Changed 8 years ago by Faheem Mitha <faheem@…>

I'm getting

Ran 155 tests in 55.010s

FAILED (failures=1, errors=1)

when I run the tests on 5820 with 5820_streaming_file_upload.diff applied.

They look like

1) Traceback (most recent call last):

File "/usr/local/src/django/django-trunk.hg.5820+streaming/tests/modeltests/test_client/models.py", line 91, in test_post_file_view

response = self.client.post('/test_client/post_file_view/', post_data)

File "/usr/local/lib/python2.4/site-packages/django/test/client.py", line 222, in post

return self.request(r)

File "/usr/local/lib/python2.4/site-packages/django/core/handlers/base.py", line 77, in get_response

response = callback(request, *callback_args, callback_kwargs)

File "/usr/local/src/django/django-trunk.hg.5820+streaming/tests/modeltests/test_client/views.py", line 52, in post_file_view

c = Context({'file': request.FILESfile_file?})

File "/usr/local/lib/python2.4/site-packages/django/utils/datastructures.py", line 143, in getitem

raise MultiValueDictKeyError, "Key %r not found in %r" % (key, self)

MultiValueDictKeyError: 'Key \'file_file\' not found in <MultiValueDict: {\'file\': [{\'content\': \'# coding: utf-8
n"""
n38. Testing usi
[...]

2) FAIL: test_simple_upload (regressiontests.test_client_regress.models.AssertFileUploadTests)

Traceback (most recent call last):

File "/usr/local/src/django/django-trunk.hg.5820+streaming/tests/regressiontests/test_client_regress/models.py", line 215, in test_simple_upload

self.assertEqual(response.status_code, 200)

AssertionError: 500 != 200

Am I doing something wrong here?

Changed 8 years ago by Faheem Mitha <faheem@…>

comment:193 Changed 8 years ago by Faheem Mitha <faheem@…>

Uploaded new diff 6469_streaming_file_upload.diff, applying against trunk.

Currently one unit test is failing, traceback follows. There were two tests failing in the earlier version (5820), one of which was this one, but [560] helped me fix one of them on IRC.

[560] thinks this is an unicode error. If this version works for people, I'd like to hear about it. It would really nice to get this merged into trunk.

Thanks, Faheem.

*
FAIL: test_simple_upload (regressiontests.test_client_regress.models.FileUploadTests)


Traceback (most recent call last):

File "/usr/local/src/django/django-trunk.hg+mystreaming/tests/regressiontests/test_client_regress/models.py", line 244, in test_simple_upload

self.assertEqual(response.status_code, 200)

AssertionError: 500 != 200


Ran 205 tests in 85.939s

FAILED (failures=1)

comment:194 follow-up: Changed 8 years ago by anonymous

Okay, so if I want to test this with a clean trunk, which of the above do I use? Trying to figure out if I need the middleware (if so which one), or just 6469 or what? I am needed to upload a 500M+ file on a weekly basis and would love to know HOW to use this.

comment:195 in reply to: ↑ 194 ; follow-up: Changed 8 years ago by Faheem Mitha <faheem@…>

Replying to anonymous:

Okay, so if I want to test this with a clean trunk, which of the above do I use? Trying to > figure out if I need the middleware (if so which one), or just 6469 or what? I am needed to > upload a 500M+ file on a weekly basis and would love to know HOW to use this.

6469_streaming_file_upload.diff should apply cleanly against trunk (hasn't been that long). Just tested it (patched against 6469 trunk) and it appears to work fine. You don't need anything else if you just want the streaming upload functionality. Note that apparently there are some issues with FileField/ImageField in forms/newforms, but I have no direct experience of this.

Would be interested to hear how it works for you.

Faheem.

comment:196 in reply to: ↑ 195 Changed 8 years ago by anonymous

Isn't that just the way. I wrote this post, re-re-rechecked out the trunk and applied the patch. Works marvelously. Combined with a crontab for file conversion and everything is working wonderfully.

Thanks!

Replying to Faheem Mitha <faheem@email.unc.edu>:

Replying to anonymous:

Okay, so if I want to test this with a clean trunk, which of the above do I use? Trying to > figure out if I need the middleware (if so which one), or just 6469 or what? I am needed to > upload a 500M+ file on a weekly basis and would love to know HOW to use this.

6469_streaming_file_upload.diff should apply cleanly against trunk (hasn't been that long). Just tested it (patched against 6469 trunk) and it appears to work fine. You don't need anything else if you just want the streaming upload functionality. Note that apparently there are some issues with FileField/ImageField in forms/newforms, but I have no direct experience of this.

Would be interested to hear how it works for you.

Faheem.

comment:197 Changed 8 years ago by eibaan

I noticed a problem with an older patch that seems to be still in 6469. Uploading a FORM with both a file and text INPUT and the input text containing umlauts or other non-ascii characters, you get an exception. I fixed it by replacing line 272 of multipartparser.py with self._post.appendlist(self._fieldname, unicode(self._field.getvalue(), "utf-8")).

Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Fixed unicode in POST issues, added django.http.utils, moved str_to_unicode there

comment:198 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Added a new patch, fixed unicode in POST, moved str_to_unicode

Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Fixed a problem with UploadedFile wrapper and making sure content is not read in Fie/ImageField

comment:199 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

  • Needs tests set

Everything seems to work now, just needs some form tests.

comment:200 Changed 8 years ago by jmv

I've installed the latest trunk, applied this patch (6603_all_tests_pass_uploadedfile_wrapper_fixed.diff) and discovered two problems. Under mod_python 3.3.1 uploading 12Mb image results in apache consuming all available memory and dying. This is probably a mod_python error as it doesn't occur when using development server or mod_wsgi.

When I'm trying to upload 600Mb file under mod_wsgi, though, I get another problem:

Traceback (most recent call last):
File "/var/lib/python-support/python2.5/django/core/handlers/base.py" in _real_get_response
  81. response = callback(request, *callback_args, **callback_kwargs)
File "/var/lib/python-support/python2.5/django/contrib/admin/views/decorators.py" in _checklogin
  55. return view_func(request, *args, **kwargs)
File "/var/lib/python-support/python2.5/django/views/decorators/cache.py" in _wrapped_view_func
  39. response = view_func(request, *args, **kwargs)
File "/var/lib/python-support/python2.5/django/contrib/admin/views/main.py" in change_stage
  332. errors = manipulator.get_validation_errors(new_data)
File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in get_validation_errors
  61. errors.update(field.get_validation_errors(new_data))
File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in get_validation_errors
  378. self.run_validator(new_data, validator)
File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in run_validator
  368. validator(new_data.get(self.field_name, ''), new_data)
File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in isValidImage
  713. validators.isValidImage(field_data, all_data)
File "/var/lib/python-support/python2.5/django/core/validators.py" in isValidImage
  180. content = field_data['content']

  KeyError at /admin/account/pokerroom/5/
  'content'

Seems that patch is already outdated. Where can I get a newer version?

Eugene

comment:201 Changed 8 years ago by jmv

  • Cc registrations@… added

comment:202 Changed 8 years ago by jmv

BTW, it seems that problem was caused by insufficient space on /tmp partition I'm using for uploads. Here's additional info:

POST
Variable 	Value
_file_upload_error 	u'Failed to write to temporary file.'

It's a bug anyways.

Changed 8 years ago by Øyvind Saltvik <oyvind@…>

made the isValidImage validator try the tempfile before reading content

comment:203 Changed 8 years ago by Faheem Mitha <faheem@…>

Hi Oyvind,

Just wondering what problem your most recent patch fixes. I tried using newforms and the patch breaks with it. (See traceback below), Does your patch fix this problem? I've heard of some problems related to newforms. Is this what it is?

Thanks, Faheem.

*
from models import FileUpload, FolderUpload
from django.newforms import form_for_model

FileUploadForm = form_for_model(FileUpload)
FolderUploadForm = form_for_model(FolderUpload)
*

*

Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/core/handlers/base.py" in _real_get_response

  1. response = callback(request, *callback_args, callback_kwargs)

File "/usr/local/lib/python2.4/site-packages/django/contrib/auth/decorators.py" in _checklogin

  1. return view_func(request, *args, kwargs)

File "/usr/local/lib/python2.4/site-packages/bixfile_dev/views.py" in file_upload

  1. success = form.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py" in save

  1. return save_instance(self, model(), fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py" in save_instance

  1. f.save_form_data(instance, cleaned_data[f.name])

File "/usr/local/lib/python2.4/site-packages/django/db/models/fields/init.py" in save_form_data

  1. getattr(instance, "save_%s_file" % self.name)(data.filename, data.content, save=False)

File "/usr/local/lib/python2.4/site-packages/django/db/models/fields/init.py" in

  1. setattr(cls, 'save_%s_file' % self.name, lambda instance, filename, raw_field, save=True: instance._save_FIELD_file(self,

filename, raw_field, save))
File "/usr/local/lib/python2.4/site-packages/django/db/models/base.py" in _save_FIELD_file

  1. if raw_field.has_key('tmpfilename'):

AttributeError at /proto_dev/bixfile_dev/
'str' object has no attribute 'has_key'

comment:204 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Just changed the oldforms image validator to use the tempfile. The problem you have was fixed in the 6603 patch.

comment:205 Changed 8 years ago by Faheem Mitha <faheem@…>

Hi oyvind,

I'm getting some test breakage with your patch.

Ran 211 tests in 90.369s

FAILED (failures=3)

The most common error is:

AttributeError: 'UploadedFile' object has no attribute 'content'.

Changed 8 years ago by Øyvind Saltvik <oyvind@…>

diff did not apply cleanly, fixed

comment:206 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

All tests run fine now

comment:207 Changed 8 years ago by Øyvind Saltvik <oyvind@…>

Django patched with the latest streaming patch can now be checked out using mercurial.

hg clone http://streaming.cylon.no/streaming django_src

comment:208 Changed 8 years ago by Bastian Kleineidam <calvin@…>

  • Cc calvin@… added

comment:209 Changed 8 years ago by anonymous

will this get into trunk any time soon?

comment:210 Changed 8 years ago by anonymous

6654_fixed_tests_and_file_clean.diff fails against 6671.

comment:211 Changed 8 years ago by anonymous

  • Cc rudolph.froger@… removed

comment:212 Changed 8 years ago by anonymous

  • Cc works@… added

comment:213 Changed 8 years ago by Andrew Badr <andrewbadr.etc@…>

  • Cc andrewbadr.etc@… added

comment:214 follow-up: Changed 8 years ago by Joshua Works

Could someone post updated directions to get the latest incarnation of this patch working? I'm using Oyvind's mercurial source, but I'm not sure if I still need to modify the middleware, change my models at all, add settings to settings.py, etc.

Also, is the progress meter still included in this patch, and what do I need to see it in the Admin?

There hasn't been much activity here lately -- is this thing close to making it into the trunk and getting some proper documentation?

comment:215 in reply to: ↑ 214 Changed 8 years ago by anonymous

Replying to Joshua Works:

Could someone post updated directions to get the latest incarnation of this patch working? I'm using Oyvind's mercurial source, but I'm not sure if I still need to modify the middleware, change my models at all, add settings to settings.py, etc.

It works out of the box for me, at least it does for the last version I used it for, which corresponds to Oyvinds last patch. However, apparently this patch does not apply cleanly to current trunk. Oyvind asked me to do a merge, but I've been busy with other things. I'll do my best to get to this in the next day or two.

Also, is the progress meter still included in this patch, and what do I need to see it in the Admin?

No, that is in another patch http://code.djangoproject.com/ticket/4165, and is buggy.
See my comments there. Apart from a backend problem with the Python code, the JS code has
problems too. It should be easy to rewrite for someone who knows JS, though. I may end up doing it myself if nobody else does, but I'll probably use a library like PrototypeJS or Jquery.

There hasn't been much activity here lately -- is this thing close to making it into the trunk and getting some proper documentation?

I think it will need a bunch of people to give it a push. I've been intending to try and coordinate something, but I've been busy with other things.

If you don't hear anything, feel free to ping me via private email. I'm quite keen that the patch continue working till it is merged into trunk. I need it for some stuff myself.

Faheem.

comment:216 Changed 8 years ago by mtredinnick

It absolutely doesn't need "a bunch of people to give it a push". We know this ticket is here and at some point one of the maintainers will sit down and review it again to see if all the problems from the last review have been fixed and no new ones introduced.

Changed 8 years ago by Faheem Mitha <faheem@…>

Updated streaming patch to trunk rev 6710.

comment:217 Changed 8 years ago by Faheem Mitha <faheem@…>

Added streaming.6710.patch against trunk rev 6710. This passes all Django tests. I haven't tested it extensively, but it appears to work. Some quick notes on how I generated this patch in case anyone is interested. I used mercurial and kdiff3 for merging (OS Debian etch).

Note that the only file that required manual intervention was django/http/utils.py. This file could probably use a little cleanup.
Everything else was handled by kdiff3.

See Section 12.8 of the hgbook for a recipe for doing this http://hgbook.red-bean.com/hgbookch12.html#x16-28700012.8 and also this wiki page,
http://www.selenic.com/mercurial/wiki/index.cgi/MqMerge

Note that the method described in the wiki is currently less likely to give problems than the method in the hgbook (otherwise similar).

See this issue for further discussion, http://www.selenic.com/mercurial/bts/issue844

comment:218 Changed 8 years ago by anonymous

Successfully patched trunk rev 7018 with streaming.6710.patch. Works well.

What are the main issues blocking trunk integration? This functionality is of high importance to many, as one could argue that without this patch the current FileField and ImageField implementation is both buggy and exposes a security risk (denial of service by uploading large files as Apache is eating up memory and peaking CPU usage + large file upload plainly fails in the end without the patch).

Thus strong +1 for integration ASAP.

comment:219 Changed 8 years ago by ipse <ipse.reg@…>

  • Cc ipse.reg@… added

comment:220 Changed 8 years ago by Soeren Sonnenburg <bugreports@…>

I also urgently need that feature ... any chance to see this merged soon?

comment:221 Changed 8 years ago by ubernostrum

Please do not ask "when will this be merged" questions in the ticket tracker; all it does is generate noise. If you're genuinely interested in helping to get a specific patch merged, consider asking on the developers list about why it hasn't yet been committed or, better yet, searching the archive of that list to see if there's a reason given.

comment:222 Changed 8 years ago by anonymous

  • Cc uptimebox@… added

Changed 8 years ago by egon

Made it apply cleanly to rev. 7092

comment:223 Changed 8 years ago by faheem

streaming.7092.patch gives some errors. As usual, I don't understand
what is going on well enough to fix them. See below.

*
======================================================================
FAIL: Doctest: modeltests.model_forms.models.test.API_TESTS


Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest

raise self.failureException(self.format_failure(new.getvalue()))

AssertionError: Failed doctest test for modeltests.model_forms.models.test.API_TESTS

File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line unknown line number, in API_TESTS


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

type(f.cleaned_datafile?)

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[198]>", line 1, in ?

type(f.cleaned_datafile?)

AttributeError: 'TextFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[199]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.file

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[200]>", line 1, in ?

instance.file

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f = TextFileForm(data={'description': u'Assistance'}, instance=instance)

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[201]>", line 1, in ?

f = TextFileForm(data={'description': u'Assistance'}, instance=instance)

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.cleaned_datafile?

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[203]>", line 1, in ?

f.cleaned_datafile?

AttributeError: 'TextFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[204]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.file

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[205]>", line 1, in ?

instance.file

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

os.unlink(instance.get_file_filename())

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[206]>", line 1, in ?

os.unlink(instance.get_file_filename())

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f = TextFileForm(data={'description': u'Assistance'}, files={'file': {'filename': 'test2.txt', 'content': 'hello world'}}, instance=instance)

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[207]>", line 1, in ?

f = TextFileForm(data={'description': u'Assistance'}, files={'file': {'filename': 'test2.txt', 'content': 'hello world'}}, instance=instance)

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[209]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.file

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[210]>", line 1, in ?

instance.file

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.delete()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[211]>", line 1, in ?

instance.delete()

NameError: name 'instance' is not defined


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[219]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be changed because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.file

Expected:

u'.../test3.txt'

Got:



File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

type(f.cleaned_dataimage?)

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[226]>", line 1, in ?

type(f.cleaned_dataimage?)

AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[227]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The ImageFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[228]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.cleaned_dataimage?

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[231]>", line 1, in ?

f.cleaned_dataimage?

AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[232]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[233]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

os.unlink(instance.get_image_filename())

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[234]>", line 1, in ?

os.unlink(instance.get_image_filename())

AttributeError: 'TextFile' object has no attribute 'get_image_filename'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[237]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[238]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.delete()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[239]>", line 1, in ?

instance.delete()

File "/usr/local/lib/python2.4/site-packages/django/db/models/base.py", line 324, in delete

assert self._get_pk_val() is not None, "%s object can't be deleted because its %s attribute is set to None." % (self._meta.object_name, self._meta.pk.attname)

AssertionError: TextFile object can't be deleted because its id attribute is set to None.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[247]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The ImageFile could not be changed because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Expected:

u'.../test3.png'

Got:


======================================================================
FAIL: Doctest: regressiontests.forms.tests.test.fields_tests


Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest

raise self.failureException(self.format_failure(new.getvalue()))

AssertionError: Failed doctest test for regressiontests.forms.tests.test.fields_tests

File "/usr/local/src/django/django-trunk.patched/tests/regressiontests/forms/tests.py", line unknown line number, in fields_tests


File "/usr/local/src/django/django-trunk.patched/tests/regressiontests/forms/tests.py", line ?, in regressiontests.forms.tests.test.fields_tests
Failed example:

type(f.clean({'filename': 'name', 'content': 'Some File Content'}, 'files/test4.pdf'))

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest regressiontests.forms.tests.test.fields_tests[278]>", line 1, in ?

type(f.clean({'filename': 'name', 'content': 'Some File Content'}, 'files/test4.pdf'))

File "/usr/local/lib/python2.4/site-packages/django/newforms/fields.py", line 453, in clean

raise ValidationError(self.error_messagesempty?)

ValidationError: [u'The submitted file is empty.']


Ran 234 tests in 63.274s

FAILED (failures=2)

Changed 8 years ago by faheem

Slightly modified version of streaming.7092.patch with more tests passing.

comment:225 Changed 8 years ago by faheem

With the patch I just uploaded (basically the changes are liberal use of 'content-length' as suggested by Oyvind, the errors are now reduced
to
======================================================================
FAIL: Doctest: modeltests.model_forms.models.test.API_TESTS


Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest

raise self.failureException(self.format_failure(new.getvalue()))

AssertionError: Failed doctest test for modeltests.model_forms.models.test.API_TESTS

File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line unknown line number, in API_TESTS


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

type(f.cleaned_dataimage?)

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[226]>", line 1, in ?

type(f.cleaned_dataimage?)

AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[227]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The ImageFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[228]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.cleaned_dataimage?

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[231]>", line 1, in ?

f.cleaned_dataimage?

AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[232]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[233]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

os.unlink(instance.get_image_filename())

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[234]>", line 1, in ?

os.unlink(instance.get_image_filename())

AttributeError: 'TextFile' object has no attribute 'get_image_filename'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

f.is_valid()

Expected:

True

Got:

False


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance = f.save()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[237]>", line 1, in ?

instance = f.save()

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance

raise ValueError("The %s could not be %s because the data didn't"

ValueError: The TextFile could not be created because the data didn't validate.


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.image

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[238]>", line 1, in ?

instance.image

AttributeError: 'TextFile' object has no attribute 'image'


File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:

instance.delete()

Exception raised:

Traceback (most recent call last):

File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run

compileflags, 1) in test.globs

File "<doctest modeltests.model_forms.models.test.API_TESTS[239]>", line 1, in ?

instance.delete()

File "/usr/local/lib/python2.4/site-packages/django/db/models/base.py", line 324, in delete

assert self._get_pk_val() is not None, "%s object can't be deleted because its %s attribute is set to None." % (self._meta.object_name, self._meta.pk.attname)

AssertionError: TextFile object can't be deleted because its id attribute is set to None.


Ran 234 tests in 63.014s

FAILED (failures=1)

comment:226 Changed 8 years ago by anonymous

content-length is in the wrong dict in ImageField tests and probably some others too.

>>> f = ImageFileForm(data={'description': u'An image'}, files={'image': {'filename': 'test.png', 'content': image_data}, 'content-length':len(image_data)})

should be

>>> f = ImageFileForm(data={'description': u'An image'}, files={'image': {'filename': 'test.png', 'content': image_data, 'content-length':len(image_data)}})

Changed 8 years ago by egon

Passes al tests.

comment:227 Changed 8 years ago by anonymous

  • Cc drfarina@… added

comment:228 Changed 8 years ago by beau

  • Cc beau@… added

comment:229 Changed 8 years ago by anonymous

  • Cc giuliani.v@… added

comment:230 Changed 8 years ago by fdr

I've been rewriting the multipart parsing to be lazier and found something that seems to me like a DDOS exploit in this code.

Consider line 231-233 in streaming.7106.patch. What happens here is:

  1. A multipart boundary has been seen recently, so we are in the HEADER state
  2. We are looking to match the end of of the header
  3. If there's not enough data we simply break and add more bytes to our unmatched bytes from the network and try matching again later

Is there any reason why a "bad" client could not send a multipart request that *never* properly sends the header termination bytes and thus forces the multipart parser to load as much of the stream as possible (up to Content-Length bytes) into memory? As the amount of bytes increases towards the maximum Content-Length the call to .find() itself will become incredibly expensive. A few malformed requests might also cause the server to swap into oblivion or run out of address space on 32 bit systems.

comment:231 Changed 8 years ago by fdr

Bad wording in the prior comment.

First sentence should be read as follows:

This patch appears to introduce a DOS attack vector into Django.

comment:232 Changed 8 years ago by egon

I've tested this with:

  • Lighttpd 1.4 with fastcgi/scgi: doesn't work. As somebody else said above it looks like it isn't streaming the upload at all, or maybe lighttpd waits for the POST to be over before streaming it to the fcgi app.
  • Apache 2.2 with scgi: works fine so far.

comment:233 Changed 8 years ago by avain <avainitri@…>

  • Component changed from Core framework to HTTP handling
  • Has patch unset
  • Needs tests unset
  • Patch needs improvement set

I've tested streaming.7106.patch. It works for the upload file with 860MB. However I tested it with 1.5G. It fails. The message is below.

Request Method:  	POST
Request URL: 	http://192.168.1.4/imagecity/upload/
Exception Type: 	MemoryError
Exception Value: 	out of memory
Exception Location: 	/usr/lib/python2.5/site-packages/django/http/multipartparser.py in _read, line 285
/usr/lib/python2.5/site-packages/django/http/multipartparser.py in _read
 285. self._file.write(data[start:end]) ...

comment:234 Changed 8 years ago by Roman Tolkachyov <rt@…>

save_FOO_file() works wrong with streaming.7106.patch
If raises

Traceback (most recent call last):
  ...
  File "/usr/lib/python2.5/site-packages/django/db/models/fields/__init__.py", line 783, in <lambda>
    setattr(cls, 'save_%s_file' % self.name, lambda instance, filename, raw_field, save=True: instance._save_FIELD_file(self, filename, raw_field, save))
  File "/usr/lib/python2.5/site-packages/django/db/models/base.py", line 414, in _save_FIELD_file
    if raw_field.has_key('tmpfilename'):
AttributeError: 'str' object has no attribute 'has_key'

comment:235 Changed 8 years ago by fdr

I worked on a lazier (i.e. you can read from the stream without loading the entire thing onto disk) and hopefully better patch for this, in case anyone is getting antsy. I haven't been able to return to it for a week or two, but all the annoying parts are pretty much done.

It does, however, gut the current implementation. At that time I also tested mod_wsgi2's buffering behavior and found that it does not do aggressive buffering (a good thing), so sites running behind mod_wsgi2 can do flow control on clients. This was done by augmenting the current patch with .sleep() calls between blocks and then posting a large file.

This is particularly important if you have to send the material to another server and would prefer (or require) that it not be held on disk in full, if the stream is of indeterminate length, or if you wanted to hold the request in full to begin with (for example: filtering a dataset as it goes by and populating the database with a structured representation)

comment:236 Changed 7 years ago by axiak

  • Status changed from new to assigned

Alright I'm going to be working on this ticket this week. Before I do any work, I'm uploading the current diff updated to work against HEAD largely untouched. After reviewing the comments, before we can consider this ready for checking or close to that, I think that the following is needed:

  1. Updating this to work with the new-ish FileField backends.
  2. Pull the MultiPartParser logic apart from the actual dealing with the chunked data. (I.e. a user should be able to do whatever they want with the data.)
  3. Make the MultiPartParser lazier (i.e. not look for boundaries, but let them come when available).
  4. Go through the bugs that people have reported (thanks for doing so!) and ensuring that they no longer exist.

-Mike

Changed 7 years ago by axiak

Same patch applied to @7277

comment:237 Changed 7 years ago by mtredinnick

I think the general design also needs a review. Things like a get parameter that is acted upon before it gets to the view isn't really acceptable, since it means you can never use that parameter name in normal code. So can you please concentrate on a patch for solving the initial problem (large file uploads) and worry about stuff like upload progress meters after some design discussion on django-dev as a later thing. The last few patches seem to have been heading into feature-growth territory and not solving the legitimate problem that was the original bug report.

comment:238 Changed 7 years ago by axiak

  • Has patch set
  • Needs documentation set
  • Needs tests set

Alright, this has been a long time waiting, but the ticket has been refactored to what I think has been a fairly reasonable state. It might undergo a few movements of code, but I think the overall design is here to stay. To get a grasp of the API this ticket enables, please visit the thread on this matter. [1] A great thanks goes to fdr--- he and I have been putting the *sprint* in this past Django sprint during the late hours (our hemisphere).

Please feel free to test this by running it on a platform and giving us the results!

An overview of the status of this ticket:

  • It has basic functionality. It fails fast...we'll probably silence a couple of the errors before we go to trunk, but it's good to have noise while people are debugging.
  • There is one test. The code for this test can be used to extend to more harsher tests (wrong content lengths, etc) to see how the parser handles.
  • There are no doc changes yet. I'll have basic ones by the end of the week (how to change the default temporary file location, etc), and the docs for describing the API...I'm not sure :-)
  • A few design things that could be resolved:
    • Supporting multiple upload handlers (right now it supports 1 in a list that could be extended, but we haven't how it deals with ordering).
    • Adding signals [doesn't have to be in this ticket, I think]
    • We send the charset to the handler, but we could take care of encoding for it if the mime starts with 'text/'...people might complain about getting unicode objects (like a CSV for example).
  • I separated the diffs to the one that deals with the uploading, and the one that deals with the saving of the uploaded data.
  • The newforms interface is completely untested as of now.

Changed 7 years ago by axiak

NEW Upload handling for revision 7339

Changed 7 years ago by axiak

NEW Form handling for uploaded files for revision 7339

comment:239 Changed 7 years ago by fdr

I'll probably be furnishing the enhanced docs before too long. One important debugging feature that isn't in yet is a way to detect if the parser starts to undergo an infinite loop, which can happen when LazyStream.read() and LazyStream.unget() are called in a way where the same bytes constantly get read out and pushed back in succession. It'd be much better to crash quickly and log well in this case so bugs can get reported.

Changed 7 years ago by axiak

Latest patch for 2070...includes everything (see comment)

comment:240 Changed 7 years ago by axiak

I just added a new diff, 2070_revision7351_latest.diff. Against my earlier ways, I've decided to just merge the two patch changes, since you really can't use one and not both. The new patch features the ability to work with multiple upload handlers, updating forms to work better, and a few bugfixes. The patch should now pass all tests.

What YOU can do to help:

  1. Download the patch.
  2. Run patch -p0 < 2070_revision7351_latest.diff on your Django installation.
  3. Run tests (django/tests/runtests.py --settings=...)
  4. Test with a project that involves uploading. (Big and small!)
    • Your memory should stay below 100 MB regardless of how big your upload size is (TODO: measure how big the typical process size is and see if we have to call gc.collect() at some point.)
    • Your temporary directory should contain partial files during the upload is progressing (ls -ltr /tmp on *nix systems).
    • The uploading should work with all of your apps and forms like they have.
  5. Report ANY errors to this ticket.

Things left from this patch:

  1. Documentation of things other than settings.FILE_UPLOAD_TEMP_DIR. (I'm not sure it's a blocker for this ticket...we'd probably have to have another doc. section for this API.)
  2. Some more mean tests could be written (probably have to circumvent the test client for a few).
  3. While the parser shouldn't enter an infinite loop, it can currently happen. There will be some accounting to make sure that this failure mode doesn't exist in a new patch.
  4. Review from the devs.

Changed 7 years ago by axiak

The latest #2070 patch. Small changes...see comment.

comment:241 Changed 7 years ago by axiak

I just uploaded a new patch 2070_latest7354.diff. This includes minor fixes including:

  • Works/tested on python 2.3.
  • Works/tested on windows.
  • Changes how multiple handlers behave.

Later tonight I'm going to shoot an email to django-developers with a revised API, since the implementation has of course given me more insight into how things should work.

comment:242 follow-up: Changed 7 years ago by Maniac <Maniac@…>

Mike, thanks for the new patch! Leaving aside design issues I have some comments on the implementation part. Feel free to disagree with any!

  • You have many multi-line comments decorated with in boxes of #s. But the rest of the code doesn't do this, all comments are usually have just one # on the left of each line. And I personally prefer it in Django way, it's easier to read :-)
  • In MultipartParser.init you seem to half got rid of guessing content_length if they're not in HTTP headers. What was the intent? Isn't headers the only place where it should be and thus you should raise and exception right away if it's not available?
  • In MultipartParser.parse you have a common pattern of invoking a method on all handlers until it returns some non-false value. May be abstract it in a function like call_handlers(method_name, *args, kwargs) that will do the iteration?
  • line 200 of multipartparser.py: if content-transfer-encoding is available but not 'base64' you're feeding non-decoded data to handlers. I believe it's only going to work if encoding is '8bit' and in other cases it's better to raise an exception that encoding is unknown.
  • I couldn't figure out why FakePayload is needed. Can you clarify?
  • In HttpRequest you're taking special care of not allowing setting _upload_handlers in _set_upload_handlers but they can still be changed since you're giving out the whole list and it's mutable. I.e. request.upload_handlers.append(handler) will work regardless. I realize now that you set it to a tuple but it can be overwritten. May be it's better to just have request.add_upload_handler()? What's the point of reading them?
  • In HttpRequest.parse_file_upload you have a comment "Order here is *very* important." but never say how exactly it works. May be a short note like: "Handlers that go sooner may cancel remaining handlers by signaling that handling is complete".
  • line 420 of db.models.base: I think a comment "exclusive lock" before "filelocks.lock(fp, filelocks.LOCK_EX)" just asks to have this call to be wrapped up in a function "exclusive_lock(fp)" so you no longer have to explain it. Looks like it's the only lock you're using anyway.

I didn't look through uploadedfile.py and fileuploadhandler.py, will leave it till later.

comment:243 in reply to: ↑ 242 ; follow-up: Changed 7 years ago by axiak

Replying to Maniac <Maniac@SoftwareManiacs.Org>:

Mike, thanks for the new patch! Leaving aside design issues I have some comments on the implementation part. Feel free to disagree with any!

Thank you for your comments! Before I begin, what issues do you have with the design?

Here's a response to each of your comments:

  • You have many multi-line comments decorated with in boxes of #s. But the rest of the code doesn't do this, all comments are usually have just one # on the left of each line. And I personally prefer it in Django way, it's easier to read :-)

This is style. I will update this in a future patch.

  • In MultipartParser.init you seem to half got rid of guessing content_length if they're not in HTTP headers. What was the intent? Isn't headers the only place where it should be and thus you should raise and exception right away if it's not available?

I'm not entirely sure what you mean by this. The current patch will not guess the content-length. Unfortunately, there's no way that I can see around this, since the content length is really necessary. Unfortunately raising an exception will result in a "Connection Reset" error, but I guess it's better than any alternative I can think of.

  • In MultipartParser.parse you have a common pattern of invoking a method on all handlers until it returns some non-false value. May be abstract it in a function like call_handlers(method_name, *args, kwargs) that will do the iteration?

If you pay close attention, you will see that it's not really common. (Especially in the latest patch.) I think the complexity involved to encapsulate each of these loops would just do more to obfuscate the code. You're welcome to prove me wrong, though. :)

  • line 200 of multipartparser.py: if content-transfer-encoding is available but not 'base64' you're feeding non-decoded data to handlers. I believe it's only going to work if encoding is '8bit' and in other cases it's better to raise an exception that encoding is unknown.

No. My interpretation of the RFCs is that we can accept any one of '8bit', '7bit', or 'binary' without doing anything. These 3 encodings are really telling us how we can interpret them (7bit being just ASCII, 8bit and binary being 8bit character sets and binary files) and don't actually need any decoding before being written to a file. My perusal of the email.message python module seems to agree with this assessment, and it also shows how to support a few other encodings I might wish to add support for.

  • I couldn't figure out why FakePayload is needed. Can you clarify?

I thought the docstring said it. It restricts what can be acted on in a test client request. The point is that (a) a StringIO object supports too many features and (b) we should complain when we try to read more than the available content, since this will result in poor performance when the file is a wrapper around a blocking network socket. Again, these are only for tests, as I feel just using StringIO doesn't nearly exhibit the same behavior as the network socket.

  • In HttpRequest you're taking special care of not allowing setting _upload_handlers in _set_upload_handlers but they can still be changed since you're giving out the whole list and it's mutable. I.e. request.upload_handlers.append(handler) will work regardless. I realize now that you set it to a tuple but it can be overwritten. May be it's better to just have request.add_upload_handler()? What's the point of reading them?

It can only override it by using the "non-public" API of _upload_handlers. By using the public API, you're ensuring yourself the safety of knowing that the upload handlers are being modified with expected behavior. I thought about providing a .add_upload_handler() but then I wanted a .prepend_upload_handler() and I wanted a .insert_upload_handler() and realized that we really have a list, and the ways to modify a list is well documented. There's no reason why we can't expect someone to want to read the upload handlers to decide where he wants to place something. (E.g.: My fictional AJAXUploadHandler Middleware will always place the AJAXUploadHandler after the InMemoryUploadHandler.)

  • In HttpRequest.parse_file_upload you have a comment "Order here is *very* important." but never say how exactly it works. May be a short note like: "Handlers that go sooner may cancel remaining handlers by signaling that handling is complete".

Good point. I'll have this in a future patch.

  • line 420 of db.models.base: I think a comment "exclusive lock" before "filelocks.lock(fp, filelocks.LOCK_EX)" just asks to have this call to be wrapped up in a function "exclusive_lock(fp)" so you no longer have to explain it. Looks like it's the only lock you're using anyway.

This is really a stylistic thing. The thing is there are a lot of ways to lock a file, so I just want to use a clean syntax that supports all those ways. I don't like the idea of adding a function wrapper for each action when a comment can make it just as clear. (And really, filelocks.lock(fp, filelocks.LOCK_EX) is probably clear enough for anyone who knows what an exclusive lock is to not even need that comment.)

I didn't look through uploadedfile.py and fileuploadhandler.py, will leave it till later.

Thanks again for your comments. Look forward to hearing more from you!

comment:244 in reply to: ↑ 243 ; follow-up: Changed 7 years ago by Ivan Sagalaev <Maniac@…>

Thank you for your comments! Before I begin, what issues do you have with the design?

None, as of now :-)

  • In MultipartParser.init you seem to half got rid of guessing content_length if they're not in HTTP headers. What was the intent? Isn't headers the only place where it should be and thus you should raise and exception right away if it's not available?

I'm not entirely sure what you mean by this. The current patch will not guess the content-length. Unfortunately, there's no way that I can see around this, since the content length is really necessary. Unfortunately raising an exception will result in a "Connection Reset" error, but I guess it's better than any alternative I can think of.

By "half got rid" I mean this code in init:

try:
    content_length = int(META.get('HTTP_CONTENT_LENGTH',
                                  META.get('CONTENT_LENGTH',0)))
except (ValueError, TypeError):
    # For now set it to 0...we'll try again later on down.
    content_length = 0

# If we have better knowledge of how much
# data is remaining in the request stream,
# we should use that. (modpython for instance)
#try:
#    remaining = input_data.remaining
#    if remaining is not None and \
#            (content_length is None or remaining < content_length):
#        content_length = remaining
#except AttributeError:
#    pass

if not content_length:
    # This means we shouldn't continue...raise an error.
    raise MultiPartParserError("Invalid content length: %r" % content_length)

I think everything after first "except" can be safely deleted and in first "except" should be "raise MultiPartParserError(...)" instead of setting content_length to 0.

If you pay close attention, you will see that it's not really common. (Especially in the latest patch.) I think the complexity involved to encapsulate each of these loops would just do more to obfuscate the code. You're welcome to prove me wrong, though. :)

Oh... Now I see that read_data_chunk makes things differently requiring that non-None value to continue. It's not worth abstracting then.

A small nit... What actually made me think about this is that you have "handlers = self._upload_handlers" at the top of "parse" and I tried to figure out if it's really needed to shave 14 letters in a couple of cases at the expense of requiring a reader to make one more translations "handlers are those same handlers that lay in an object".

No. My interpretation of the RFCs is that we can accept any one of '8bit', '7bit', or 'binary' without doing anything. These 3 encodings are really telling us how we can interpret them (7bit being just ASCII, 8bit and binary being 8bit character sets and binary files) and don't actually need any decoding before being written to a file. My perusal of the email.message python module seems to agree with this assessment, and it also shows how to support a few other encodings I might wish to add support for.

What I'm worried about is 'quoted-printable'. Now it will go to the handler unchanged which is wrong. Updating my comment to your reply I think the logic should be like this:

if transfer_encoding == 'base64':
    # decode
elif transfer_encoding in ['8bit', '7bit', 'binary']:
    # just feed
else:
    raise ...

BTW, browsers don't use quoted-printable here but some hand-written user-agent could...

  • I couldn't figure out why FakePayload is needed. Can you clarify?

I thought the docstring said it. It restricts what can be acted on in a test client request. The point is that (a) a StringIO object supports too many features and (b) we should complain when we try to read more than the available content, since this will result in poor performance when the file is a wrapper around a blocking network socket. Again, these are only for tests, as I feel just using StringIO doesn't nearly exhibit the same behavior as the network socket.

I did understand what it does but didn't understand why it's needed. Using it for tests makes sense now, thanks!

It can only override it by using the "non-public" API of _upload_handlers. By using the public API, you're ensuring yourself the safety of knowing that the upload handlers are being modified with expected behavior. I thought about providing a .add_upload_handler() but then I wanted a .prepend_upload_handler() and I wanted a .insert_upload_handler() and realized that we really have a list, and the ways to modify a list is well documented. There's no reason why we can't expect someone to want to read the upload handlers to decide where he wants to place something. (E.g.: My fictional AJAXUploadHandler Middleware will always place the AJAXUploadHandler after the InMemoryUploadHandler.)

But each custom handler will know where to put itself only relative to standard handlers that it knows about... Middleware machinery just leaves this to a user to figure out. In this case I believe everyone can also place middleware, decorators etc in the right order without parsing this list in code.

(And really, filelocks.lock(fp, filelocks.LOCK_EX) is probably clear enough for anyone who knows what an exclusive lock is to not even need that comment.)

Indeed :-). That was my second option too :-)

comment:245 in reply to: ↑ 244 Changed 7 years ago by axiak

Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:
Alright, we're very close here!

  • By "half got rid" I mean this code in init ...

Yeah, actually the new patch that I have queued up changes this behavior a little bit. It's more in line with this thinking.

  • A small nit... What actually made me think about this is that you have "handlers = self._upload_handlers" at the top of "parse" and I tried to figure out if it's really needed to shave 14 letters in a couple of cases at the expense of requiring a reader to make one more translations "handlers are those same handlers that lay in an object".

I actually didn't do it to save 14 characters. I did it because this loop can potentially run millions of iterations and I didn't want to do a needless self.var lookup when var is slightly faster. By this point the loop is so complicated that I might as well change it, but still...CPU is CPU.

  • What I'm worried about is 'quoted-printable'. Now it will go to the handler unchanged which is wrong. Updating my comment to your reply I think the logic should be like this:

My hope was that someone could write a handler to deal with quoted-printable :).

  • But each custom handler will know where to put itself only relative to standard handlers that it knows about... Middleware machinery just leaves this to a user to figure out. In this case I believe everyone can also place middleware, decorators etc in the right order without parsing this list in code.

Yeah I understand the issue. I'm not sure of the best interface. My thinking was that a list is so well-known that people won't be surprised to learn to do: request.upload_handlers.append(upload_handler). I'm open to suggestions that will support the different operations.

Thanks again for your review. I'm writing a long-ish email to Django developers outlining a few of the issues and a revised API description.

comment:246 Changed 7 years ago by axiak

  • Needs documentation unset

Okay, I'm posting a new diff. Not many changes. A few stylistic changes, a better handling of multiple handlers, and the new documentation for the upload handlers.

Changed 7 years ago by axiak

New diff...some style changes and new documentation.

comment:247 follow-up: Changed 7 years ago by Ivan Sagalaev <Maniac@…>

Mike, here are comments & question about fileuploadhandler.py and uploadedfile.py. Apparently, only one question :-)

MemoryFileUploadHandler implements handle_raw_input and, thus, has to duplicate the logic of the multipart parser. Why not implement this handler just like TemporaryFileUploadHandler but using StringIO?

comment:248 in reply to: ↑ 247 Changed 7 years ago by axiak

Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:

Mike, here are comments & question about fileuploadhandler.py and uploadedfile.py. Apparently, only one question :-)

MemoryFileUploadHandler implements handle_raw_input and, thus, has to duplicate the logic of the multipart parser. Why not implement this handler just like TemporaryFileUploadHandler but using StringIO?

Yes, I originally used StringIO and did it the same way, but I realized that since it's all in memory, I can use the proven email parser. I thought it would be really slick if it's so flexible we can even revert back to our old ways of parsing for when we know we can put it all in memory.

That being said, we can change it back if you think it's unnecessary.

comment:249 follow-up: Changed 7 years ago by Ivan Sagalaev <Maniac@…>

Yes, I think it is unnecessary. I believe we're going to debug the streaming parser to reliable state anyway, aren't we? :-) And after this we'll just have two different parsers for no purpose...

comment:250 in reply to: ↑ 249 ; follow-up: Changed 7 years ago by axiak

Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:

Yes, I think it is unnecessary. I believe we're going to debug the streaming parser to reliable state anyway, aren't we? :-) And after this we'll just have two different parsers for no purpose...

Okay. That being said, let me outline what my new patch will have, before I finish it:

  1. A settings variable for when to use memory and when to use streaming (settings.FILE_UPLOAD_MAX_MEMORY_SIZE).
  2. A settings variable for the default upload handlers. (settings.FILE_UPLOAD_HANDLERS)
  3. One single parser.
  4. Forms support for dictionary (but emitting deprecation warnings).
  5. Updating forms docs to use the new File notation instead of a dictionary.

Let me know if there's anything to add to this list!

comment:251 in reply to: ↑ 250 ; follow-up: Changed 7 years ago by Maniac <Maniac@…>

  1. A settings variable for when to use memory and when to use streaming (settings.FILE_UPLOAD_MAX_MEMORY_SIZE).

BTW when I was thinking about such thing in the past I had an idea that we can use the size of the buffer that we use to read data from input_stream to be this maximum value. I.e. if we can afford some amount of memory per file then there's no point of making buffer size smaller.

  1. A settings variable for the default upload handlers. (settings.FILE_UPLOAD_HANDLERS)
  2. One single parser.
  3. Forms support for dictionary (but emitting deprecation warnings).
  4. Updating forms docs to use the new File notation instead of a dictionary.

This is great! Looking forward to it.

comment:252 in reply to: ↑ 251 ; follow-up: Changed 7 years ago by axiak

BTW when I was thinking about such thing in the past I had an idea that we can use the size of the buffer that we use to read data from input_stream to be this maximum value. I.e. if we can afford some amount of memory per file then there's no point of making buffer size smaller.

I had a similar idea, however this is entirely dependent on the garbage collection, no? That is, if we called gc.collect() on every iteration of the chunks, I think we'd be set.

-Mike

comment:253 in reply to: ↑ 252 ; follow-up: Changed 7 years ago by Maniac <Maniac@…>

I had a similar idea, however this is entirely dependent on the garbage collection, no? That is, if we called gc.collect() on every iteration of the chunks, I think we'd be set.

Uhm... Looks like I wasn't clear enough, sorry... I just mean that if we have a setting FILE_UPLOAD_MAX_MEMORY_SIZE then we can use this value for chunk_size in FileUploadHandler. It has nothing to do with GC.

comment:254 in reply to: ↑ 253 Changed 7 years ago by axiak

Replying to Maniac <Maniac@SoftwareManiacs.Org>:

Uhm... Looks like I wasn't clear enough, sorry... I just mean that if we have a setting FILE_UPLOAD_MAX_MEMORY_SIZE then we can use this value for chunk_size in FileUploadHandler. It has nothing to do with GC.

I don't think so. These are two different parameters. As an example, suppose I set FILE_UPLOAD_MAX_MEMORY_SIZE to 2.5 MB. If this were the chunk size, then that would mean each iteration would create a value which held 2.5MB worth of data. If the GC always collected OR if the GC collected based on *size*, then it would mean that the upload shouldn't take appreciably more than 2.5MB total. However, if the GC collects based on number of allocations (which I believe is the default in standard CPython), then having a chunk size of 2.5MB would take appreciably more memory than having a lower chunk size; and certainly much more memory than an entire upload going into memory at once.

At least, this is my understanding of the garbage collection in python. If you look at your memory usage during an upload, you will see interesting memory usage.

comment:255 Changed 7 years ago by Maniac <Maniac@…>

Ah... Now I understand the relation with GC. But you seem to have explored this question much more than I even tried so I refrain with my idea.

comment:256 Changed 7 years ago by axiak

Okay! This patch includes all of the aforementioned improvements. It still passes all the tests. I've tested uploads on the following platforms:

  • Python 2.5/2.3
  • Apache mod_python, mod_fastcgi, mod_wsgi
  • Lighttpd mod_fastcgi
  • Development server

They all stream out of the box.

comment:257 Changed 7 years ago by axiak

  • Needs tests unset
  • Patch needs improvement unset

I like the patch as is. I think now is a good time to start testing and seeing if this is what people want.

Changed 7 years ago by axiak

Altered documentation to be more approachable.

Changed 7 years ago by axiak

Slightly newer...handles exhaustion a little bit differently.

comment:258 follow-up: Changed 7 years ago by Eric B <ebartels@…>

There's a problem with the image validation in django.newforms.fields.ImageField (line 507)

If file is a StringIO object, the second call to Image.open will fail with an IOError unless there's a file.reset() call first, something like:

try:
    # load() is the only method that can spot a truncated JPEG,
    #  but it cannot be called sanely after verify()
    trial_image = Image.open(file)
    trial_image.load()
    # verify() is the only method that can spot a corrupt PNG,
    #  but it must be called immediately after the constructor
    if hasattr(file, 'reset'):
        file.reset()
    trial_image = Image.open(file)
    trial_image.verify()
except Exception: # Python Imaging Library doesn't recognize it as an image
    raise ValidationError(self.error_messages['invalid_image'])

comment:259 in reply to: ↑ 258 ; follow-ups: Changed 7 years ago by axiak

Replying to Eric B <ebartels@gmail.com>:

There's a problem with the image validation in django.newforms.fields.ImageField (line 507)
[...]

Hi Eric, thanks so much for the information. I just have a comment and a question.

Question: If this is a problem, how did this patch change anything? The current implementation just does StringIO(f.content). I was intending to change as little functionality as possible, and it passes the tests (which I believe include image validity checking in newforms)...

Comment: There's no reason to have the hasattr()}} check there, because there's an if statement 5 lines above that will know when its a {{{StringIO object and when it isn't.

Again, thanks for your feedback!

comment:260 in reply to: ↑ 259 Changed 7 years ago by axiak

Replying to axiak:

Replying to Eric B <ebartels@gmail.com>:

There's a problem with the image validation in django.newforms.fields.ImageField (line 507)
[...]

Hi Eric, thanks so much for the information. I just have a comment and a question.

Question: If this is a problem, how did this patch change anything? The current implementation just does StringIO(f.content). I was intending to change as little functionality as possible, and it passes the tests (which I believe include image validity checking in newforms)...

Doh. I see it now. Sorry about that. :-)

I have a few other changes queued up, so I'll include that in the next patch.

comment:261 in reply to: ↑ 259 Changed 7 years ago by axiak

Replying to axiak:

and it passes the tests (which I believe include image validity checking in newforms)...

As it turns out, the current tests have a model called ImageFile which actually doesn't test the image field. Is there any reason we cannot do this for our tests?

class ImageFile(models.Model):
    description = models.CharField(max_length=20)
    try:
        # If PIL is available, try testing PIL.
        # Otherwise, it's equivalent to TextFile above.
        import Image
        image = models.ImageField(upload_to=tempfile.gettempdir())
    except ImportError:
        image = models.FileField(upload_to=tempfile.gettempdir())

    def __unicode__(self):
        return self.description

After doing that, the test fails properly without Eric's suggested modification. (When PIL is installed.)

Cheers,
Mike

comment:262 Changed 7 years ago by andrewsk

  • Cc prigun@… added

Changed 7 years ago by axiak

New 2070 patch...includes fixes to the parser in addition to the newforms fix caught by EricB as well as better docs.

comment:263 Changed 7 years ago by axiak

The new patch includes a couple improvements over the last. No change in design, just small buxfixes:

  1. The low-level API parser now correctly keeps track of where it is (LazyStream). (The default django api never used this feature.)
  2. The newforms problem with Image validation is fixed, along with modifying the tests to test that.
  3. The docs were reworded a little bit.

If it makes reviewing any easier, I'm thinking of coming up with a patch series that will represent a good flow of changes to follow...that's next on my list. If you are thinking about reviewing, here's the path I'd take:

  1. Read the high-level docs. http://orodruin.axiak.net/upload_handling.html Make sure you're fine with the design.
  2. Read the diff of stuff in /tests. Make sure you're fine with the test modifications.
  3. At this point it's probably best to start with the HTTP request and work your way in.

Thanks to everyone who's trying this patch out!

Cheers,
Mike

comment:264 follow-up: Changed 7 years ago by bbrriiccee@…

This patch correct the bug for the ImageField. The problem come from StringIO(data.read()) that's become empty after the 1st read. By using a var to store str from data.read(), you can call 2 different StringIO on the constructor.

--- fields.py   2008-03-31 15:25:36.000000000 -0700
+++ fields.py   2008-03-31 17:04:23.000000000 -0700
@@ -494,17 +494,21 @@
             except ImportError:
                 from StringIO import StringIO
             if hasattr(data, 'read'):
-                file = StringIO(data.read())
+                data_buffer = data.read()
+                file_str_io_1 = StringIO(data_buffer)
+                file_str_io_2 = StringIO(data_buffer)
             else:
-                file = StringIO(data['content'])
+                data_buffer = data['content']
+                file_str_io_1 = StringIO(data_buffer)
+                file_str_io_2 = StringIO(data_buffer)
         try:
             # load() is the only method that can spot a truncated JPEG,
             #  but it cannot be called sanely after verify()
-            trial_image = Image.open(file)
+            trial_image = Image.open(file_str_io_1)
             trial_image.load()
             # verify() is the only method that can spot a corrupt PNG,
             #  but it must be called immediately after the constructor
-            trial_image = Image.open(file)
+            trial_image = Image.open(file_str_io_2)
             trial_image.verify()
         except Exception: # Python Imaging Library doesn't recognize it as an image
             raise ValidationError(self.error_messages['invalid_image'])

comment:265 in reply to: ↑ 264 ; follow-up: Changed 7 years ago by axiak

Replying to bbrriiccee@gmail.com:

This patch correct the bug for the ImageField. The problem come from StringIO(data.read()) that's become empty after the 1st read. By using a var to store str from data.read(), you can call 2 different StringIO on the constructor.
...

Huh. I thought I had included the fix in the latest patch...I'll review and commit the patch with the latest code.

By the way, your patch has two inefficiencies:

  1. You shouldn't have to call .read() on the file object if it has a temporary file name. If it's a large file, we should let the OS do the seeking which is what happens when you give the filename to PIL.
  2. There's no reason to have two distinct StringIO objects when you can just call .reset().

I'll have the fixed patch up in a minute or two.

Changed 7 years ago by axiak

Better patch...not sure how the newforms code in the previous patch was the way it was.

comment:266 in reply to: ↑ 265 Changed 7 years ago by axiak

Replying to axiak:

[...]
I'll have the fixed patch up in a minute or two.

Patch uploaded. Again, I'm not certain how the newforms changes didn't get in. But I do know that I was really tired when I uploaded the last version. I gave this new patch a once-over at least...

comment:267 Changed 7 years ago by Eric B <ebartels@…>

Mike, I beleive the last patch introduced an error in fileuploadhandler.load_handler. We're now passing request to load_handler as a separate parameter and wrapping it with RestrictedRequest. On line 223 of fileuploadhandler, request needs to be passed separately to the FileUploadHandler constructor also. Otherwise, the handler won't have access to it.

return cls(*args, **kwargs)

needs to be:

return cls(request, *args, **kwargs)

With that change, everything seems to be working perfectly. Thanks.

comment:268 follow-up: Changed 7 years ago by axiak

Hey Eric,
Thanks for keeping me honest :-). I will upload the patch as soon as I finish this comment.

Another question: Are you doing anything interesting with this patch? Just curious, since you wouldn't have hit this bug with either of the built-in upload handlers.

Cheers,
Mike

Changed 7 years ago by axiak

new patch that fixes error with instantiating the upload handlers, thanks Eric!

comment:269 in reply to: ↑ 268 ; follow-up: Changed 7 years ago by Eric B <ebartels@…>

Replying to axiak:

Another question: Are you doing anything interesting with this patch? Just curious, since you wouldn't have hit this bug with either of the built-in upload handlers.

Since this patch makes it so easy, I decided to throw together an upload handler to display progress feedback. I know you had a patch in #4165, so you've probably got something similar in the works. I posted what I came up with to djangosnippets: http://www.djangosnippets.org/snippets/678/

comment:270 in reply to: ↑ 269 Changed 7 years ago by axiak

Replying to Eric B <ebartels@gmail.com>:

Since this patch makes it so easy, I decided to throw together an upload handler to display progress feedback. I know you had a patch in #4165, so you've probably got something similar in the works. I posted what I came up with to djangosnippets: http://www.djangosnippets.org/snippets/678/

Awesome job. Do you want to contribute to django-uploadutils? I've been to busy with other things at the moment so haven't had time to write stuff like what you just wrote. It's very easy for me to add you as a contributor...

comment:271 Changed 7 years ago by jacob

My feedback on the patch (2070_revision7388.diff):

  1. Looks like some unrelated changes might have creapt in; this shouldn't be in the patch:
  • django/conf/global_settings.py

    a b  
    114114SEND_BROKEN_LINK_EMAILS = False
    115115
    116116# Database connection info.
    117 DATABASE_ENGINE = ''           # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
     117DATABASE_ENGINE = ''           # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'ado_mssql'.
    118118DATABASE_NAME = ''             # Or path to database file if using sqlite3.
    119119DATABASE_USER = ''             # Not used with sqlite3.
    120120DATABASE_PASSWORD = ''         # Not used with sqlite3.
  1. I have reservations about FILE_UPLOAD_HANDLERS; let's discuss this in the thread on django-developers.
  1. Please provide a link to the Python Cookbook recipe from which you took the file lock implementation. You also need to note that the file lock code is licensed under the Python Software License (as is all the cookbook code).
  1. Please double-check your code for PEP8/Django coding style. I see a number of places where you differ in style -- espcially in docstring usage, a few classes named things like FILE, FIELD, etc., and exception raising (use raise E(msg) istead of raise E, msg).
  1. Naming nitpick: django/core/files/fileuploadhandler.py -- "file" is redundant in there; you can name it (and other bits) things like django/core/files/uploadhandler.py.
  1. In general you've gone with the Java-ish style of one class per file. That's OK, and in this case I think it works relatively well. However, it can make flow hard to follow -- if a class/function is only used in a single place, you really should define it in the same place it's used. No "action item" here, just a note.
  1. Can you explain why you think you need RestrictedRequest? Can't we just insist that people write upload handlers correctly?
  1. Can you explain the save_FIELD_file/move_FIELD_file distinction to me? I don't see why you need both methods. While you're at it, can you explain the decision to change the method interface? That'll be a backwards-incompatible change, so let's not change it unless we have to.
  1. Nice job on the RFC2388 parser -- very clearly written and easy to read.
  1. Another style point: warnings code should look like this:
if something_is_wrong:
    import warnings
    warnings.warn(...)

(That is, instead of importing warnings at the module level). IIRC there's a minor overhead associated with initializing the warnings system the first time warnings is imported; our "house style" is thus to put the import in the guarded block. It also makes the future replacement of the warning with an error an easier change that doesn't leave behind an undeeded import.

  1. Similarly, in deprecating dict usage for uploaded files, keep all the backwards-compat code in the if clause. You've got this
  • django/newforms/fields.py

    a b  
    444445            return None
    445446        elif not data and initial:
    446447            return initial
     448
     449        if isinstance(data, dict):
     450            # We warn once, then support both ways below.
     451            warnings.warn("The dictionary usage for files is deprecated. Use the new object interface instead.", DeprecationWarning)
     452
    447453        try:
    448             f = UploadedFile(data['filename'], data['content'])
    449         except TypeError:
     454            file_name = data.file_name
     455            file_size = data.file_size
     456        except AttributeError:
     457            try:
     458                file_name = data.get('filename')
     459                file_size = bool(data['content'])
     460            except (AttributeError, KeyError):
     461                raise ValidationError(self.error_messages['invalid'])
     462
     463        if not file_name:

You shouldn't have both the if and the try/except; the if clause should convert the data into an uploaded file. Again, this makes removing the legacy support easier in the future.

  1. Another patch mixup: you've added an unescape_entities to django.utils.text. That's not part of this patch, is it?
  1. Again:
  • docs/db-api.txt

    a b  
    13061306Using raw strings (e.g., ``r'foo'`` instead of ``'foo'``) for passing in the
    13071307regular expression syntax is recommended.
    13081308
     1309Regular expression matching is not supported on the ``ado_mssql`` backend.
     1310It will raise a ``NotImplementedError`` at runtime.
     1311

Not part of file uploads.

  1. Don't comment out code; just remove it:
  • tests/regressiontests/bug639/tests.py

    a b  
    99from regressiontests.bug639.models import Photo
    1010from django.http import QueryDict
    1111from django.utils.datastructures import MultiValueDict
     12from django.core.files.uploadedfile import SimpleUploadedFile
    1213
    1314class Bug639Test(unittest.TestCase):
    1415       
     
    2122       
    2223        # Fake a request query dict with the file
    2324        qd = QueryDict("title=Testing&image=", mutable=True)
    24         qd["image_file"] = {
    25             "filename" : "test.jpg",
    26             "content-type" : "image/jpeg",
    27             "content" : img
    28         }
    29        
     25        #qd["image_file"] = {
     26        #    "filename" : "test.jpg",
     27        #    "content-type" : "image/jpeg",
     28        #    "content" : img
     29        #    }
     30        qd["image_file"] = SimpleUploadedFile('test.jpg', img, 'image/jpeg')
     31
    3032        manip = Photo.AddManipulator()
    3133        manip.do_html2python(qd)
    3234        p = manip.save(qd)

Overall, very well done!

comment:272 Changed 7 years ago by fdr

w.r.t. 4:

FILE, FIELD, et al are intended to just act as symbols, and thus I named them similarly to constants. Would you prefer a more regular-class names like File, Field, etc?

My rationale for the naming was:

  1. Their function is similar to that of a constant
  2. It looks more informative in a stack trace than "0" or "1" as you would see with something like "FILE = 0" declared in the module
  3. It's less ambiguous than a string

All in all: I can't believe you actually went through that monster patch in such fine detail. I thought that serious reviews would happen until axiak or I got around to breaking it up.

comment:273 Changed 7 years ago by Camille Harang

Hello,

after parsing the head of a file that is bieng uploaded (server-side parsing during the first receive_data_chunk), i would like to abort the transfert if the file is not acceptable. To do so i tried to raise a SkipFile or a StopUpload but i has no effect until the file is fully uploaded. Did i misunderstood the goal of these exceptions, or do i use it the wrong way ? Is it possible to something similar and retrieve a message from my custom FileUploadHandler in the view ?

Anyway, this patch is very nice and flexible, thank you for this :-)

Camille.

comment:274 follow-up: Changed 7 years ago by fdr

SkipFile will continue to eat the remainder of stream, although it won't deliver it to you for handling. StopFile should stop the transfer cold and cause the client to see a connection reset error. Actually, I'm surprised that StopFile isn't causing some other issues...

--- a/django/http/multipartparser.py
+++ b/django/http/multipartparser.py
@@ -230,6 +230,8 @@ class MultiPartParser(object):
                         exhaust(stream)
                     elif isinstance(e, StopUpload):
                         # Abort the parsing and break
+                        # This doesn't look right, where do we define abort?
+                        # shouldn't this crash?
                         parser.abort()
                         break
                 else:

I'll have a more thorough look before too long, but I have some stuff going on. This chunk of the code is in Mike's expertise, so he will probably chime in first. It looks like some broken refactoring or something here....my quick grep of revision logs doesn't seem to suggest there ever was anything like a "def abort"....actually, mentions of "abort" are scant in general, with the exception of this line. Seems like something got forgotten...maybe it's time for a regression test.

Thanks for the report...I think we can have this fixed quickly, looks like an oversight that will require no particular design change.

comment:275 in reply to: ↑ 274 Changed 7 years ago by axiak

Replying to fdr:

SkipFile should eat the remainder of the stream until the next file, but StopUpload (as intended) should eat the stream until the end. The reasoning for this is that a connection reset error wouldn't allow the custom UploadHandler to specify a custom error message, and the end user would be without a doubt completely confused. I think perhaps we should give a keyword argument so a person can do either:

    raise StopUpload()

or

    raise StopUpload(connection_reset=True)

So that the user understands the consequences of aborting an upload.

parser.abort()

Yeah... abort was in some initial API that never really existed nor was documented. I was going to change it but must have forgotten about it. I would've caught it had I actually done work on django-uploadutils. Time for me to get over there...

comment:276 follow-up: Changed 7 years ago by Camille Harang

Hi all,

my temporary files are correctly garbage collected in wsgi development mode, but not when running mod_python with apache2. After sutting down apache the files remain on the harddrive. Did anybody experienced that?

Camille.

comment:277 Changed 7 years ago by Horst Gutmann <zerok@…>

  • Cc zerok@… added

comment:278 in reply to: ↑ 276 ; follow-up: Changed 7 years ago by axiak

Replying to Camille Harang:

my temporary files are correctly garbage collected in wsgi development mode, but not when running mod_python with apache2. After sutting down apache the files remain on the harddrive. Did anybody experienced that?

Hey Camille. Thanks for the report! I'll check on this after I update the patch per Jacob's comments. The only way I can see this happening is if apache is somehow preventing it from garbage collecting. However, this is why we store temporary files in /tmp -- the OS can (and should) wipe out that directory periodically. :-) There's no guarantee that if a process gets a KILL signal or crashes hard that the temporary data will not be left on the disk -- I don't think we can make this guarantee. (Again though, I will look into this.)

Cheers,
Mike

comment:279 in reply to: ↑ 278 Changed 7 years ago by Maniac <Maniac@…>

There's no guarantee that if a process gets a KILL signal or crashes hard that the temporary data will not be left on the disk -- I don't think we can make this guarantee.

From my experience it really cannot be done since when Apache restarts it kills it's processes hard and they don't have a chance to cleanup. So leaving sometime tmp files is a necessary evil.

comment:280 Changed 7 years ago by jbeigel

  • Cc johannes.beigel@… added

comment:281 Changed 7 years ago by Camille Harang

Hi all,

yes i dreaded that there must be temporary file problems with apache, unfortunately i use a separate tmp directory because i some cases i need to keep the files so i want to avoid cross-filesystem copying when renaming it. I use a cron job to delete unused files (checking if it is accessed with fuser), if somebody is intersted in just ask me. BTW i added a 'keep' parameter to my TemporaryFile subclass, setting it to True at any time prevents the file to be deleted after closing, then just use os.rename() to store it smoothely elsewhere. I don't know if it is worthed to add this feature in the main code, anyway here it is bellow. I also experienced problems when checking files content in file_complete() because the file remains unclosed, i added self.file.flush() before self.file.seek(0) in my TemporaryFileUploadHandler subclass, once again i don't know if it needs to be included in the main code, it's a suggestion.

Camille.

class MaybeTemporaryFile(TemporaryFile):

    keep = False

    def __del__(self):

        if not self.keep:

            try: os.unlink(self.name)
            except OSError: pass

comment:282 Changed 7 years ago by David Clymer <david@…>

  • Cc david@… added

comment:283 Changed 7 years ago by Ivan Sagalaev <Maniac@…>

Mike, can you clarify a bit more... I now have a code that does save_FIELD_file, giving it file contents as a string. It fails with your patch since this method expects an UploadedFile instance instead. Are you going to leave it as a documented incompatibility or it can be dealt with?

comment:284 Changed 7 years ago by madssj

A little bug in the current version of the patch. In the method chunk of the class UploadedFile there is a call to the method file_size which is not a method, as it is defined as a property, and hence overwritten by this property. The property should be used insted of the method and the method file_size should be removed, as it serves no purpose. The call to file_size generates an exception TypeError: 'int' object is not callable, which again makes chunk unusable on TemporaryUploadedFile.

Changed 7 years ago by axiak

Updated diff per Jacob's requests and useful feedback.

comment:285 follow-up: Changed 7 years ago by axiak

Hi everyone,

Thank you so much for the tremendous feedback!

A few things to note about the new patch:

  1. save_FIELD_file was supposed to be backwards compatible. It is now, only emitting DeprecationWarning warnings when you do so.
  2. The file_size attribute is no longer a function nor referenced as a function.
  3. The StopUpload interface now works, with the optional connection_reset=True parameter to allow the connection to reset.
  4. I think I've gone over all of Jacob's suggestions and fixed each of the problems.

I have a few things I want to do still that don't have to do with programming:

  1. Outline where I have tested this (which platforms, what testing cases).
  2. Enumerate every single backwards incompatibility (even DeprecationWarning emissions), and explain why they are there.
  3. Respond to each of Jacob's comments.

Thanks again for your feedback!

comment:286 in reply to: ↑ 285 ; follow-up: Changed 7 years ago by john

Replying to axiak:

  1. The file_size attribute is no longer a function nor referenced as a function.


I'm new to this patch, having just needed it today, but I think this may be causing problems with InMemoryUploadedFile. With no file_size, FileField.clean() fell through to the dictionary usage, which failed at file_name = data.get('filename'). Changing that to file_name = data['filename'] avoided the failed validation, and adding a file_size attribute to {{InMemoryUploadedFile}}} (with a quick len/seek at the end of the constructor) seems to have fixed the attribute validation.

Changed 7 years ago by axiak

Fixed file_size bug with memory handling.

comment:287 in reply to: ↑ 286 ; follow-up: Changed 7 years ago by axiak

Replying to john:

(with a quick len/seek at the end of the constructor) seems to have fixed the attribute validation.

Doh. The regression is fixed in the latest patch (of the same name). Although the solution is not to do another len/seek but to update the handler to use the file_size counter.

Anyway, this teaches me that I have to think about clever ways of leveraging the TestClient to catch these types of errors.

Cheers,
Mike

comment:288 Changed 7 years ago by Ivan Sagalaev <Maniac@…>

Just checked -- save_FIELD_file now works.

comment:289 in reply to: ↑ 287 Changed 7 years ago by john

Replying to axiak:

Doh. The regression is fixed in the latest patch (of the same name). Although the solution is not to do another len/seek but to update the handler to use the file_size counter.

Yep, I just tested a clean checkout and it's working fine now without any of my cheesy hacks. ;) Thanks.

comment:290 Changed 7 years ago by Collin Grady <cgrady@…>

  • Cc dcramer@… removed

comment:291 Changed 7 years ago by anonymous

  • Cc django@… added

comment:292 Changed 7 years ago by MockSoul <mocksoul@…>

  • Cc mocksoul@… added

comment:293 Changed 7 years ago by anonymous

  • Cc schlaber@… added

comment:294 Changed 7 years ago by madssj

I would find it very useful to have the UploadedFile cary a tell() method. It basiclly just needs to propagate the call forward to file and would help to complete the file-likeness of the objects.

comment:295 Changed 7 years ago by djoume

  • Cc dsalvetti@… added

This patch is awesome, many thanks to all the developers involved in building it.
I just wanted to mention that I had to replace :

if content_length <= 0:

by

if content_length < 0:

in django/http/multipartparser.py line 94 as a workaround to a Safari bug.

For some reason, Safari 3.x on Mac (not on PC) set the Content-Type to multipart/form-data on a redirect (even if it's a GET request...) but doesn't set the Content-Length.

comment:296 Changed 7 years ago by bruno@…

Great patch, but why limit it to uploads which have multipart/form-data as their content-type?
Why not extend the mechanism to any kind of binary data upload (e.g. handling file upload using the Atom Publishing Protocol) ?

Also, http://orodruin.axiak.net/upload_handling.html doesn't seem to be up anymore. Is there another location for the API docs? Thanks, Bruno.

Changed 7 years ago by leahculver

Updated patch to r7599

comment:297 Changed 7 years ago by anonymous

  • Cc danpoe@… added

Changed 7 years ago by jacob

Streaming uploads patch updated to [7695] and reviewed.

comment:298 follow-up: Changed 7 years ago by jacob

I've completed a comprehensive review of all the code in the latest patch and updated it to trunk; the new patch is attached. I still need to edit and review the docs, but that's on me. Other than that, the TODO list is getting quite a bit shorter. Here's what's left before we can call for testing and merge this sucker:

The big stuff

  • We need unit tests for all the various data structures introduced in django/core/files (UploadedFile, the handlers, etc.). Most of this is sorta-tested by the holistic upload tests, but unit tests are particularly needed here. In most cases the needed bits are easy to mock test.


  • The file upload regression tests (currently in regressiontests/test_client_regress) ought to be moved to a seperate test module (file_uploads, probably) so it can be run seperately. More tests are also needed here; coverage seems pretty low.


  • InMemoryUploadedFile.chunk() uses return but its parent method UploadedFile.chunk() is a generator. This seems very dangerous -- the API should be a function or a generator, but not either -- and should be fixed if there's not a good reason for it.


  • FileUPloadHandler.new_file() something (i.e. "Stop") to stop handling. That's nasty; it should use some sort of exception to do that.


  • django/http/multipartparser.py:374 this should absolutely not be here. How, when and why would this happen? This absolutely needs to Just Work.



Small stuff and questions

  • django/db/models/base.py:519: is hardcoding 64k OK here, or should this be some sort of setting?


  • Look at the use of filterwarnings('ignore') in tests/modeltests/model_forms/models.py} is that intentional or just papering over internal uses of the "old" APIs? Ditto tests/regressiontests/forms/tests.py.


  • Dunno how I feel about the FieldType object -- it doesn't feel very Pythonic. Why not just assign string literals to those constants? That's how its done elsewhere in Django.


  • Idiom: ParseBoundaryStream should be parse_boundary_stream since it's a function. Unless there's a reason you called it this?

comment:299 in reply to: ↑ 298 ; follow-up: Changed 7 years ago by fdr

Replying to jacob:

  • django/http/multipartparser.py:374 this should absolutely not be here. How, when and why would this happen? This absolutely needs to Just Work.

I authored this check. It's actually not to target any regression in particular, but I decided it would be a better option than -- in event of a parser bug -- a busy loop. The rationale is that the parser is conceptually a recursive descent parser with backtracking and that if written naively that would mean that parser bugs would manifest as a stack overflow error. The old parser had a bug like this where it would just be content to buffer as much as the stream as it wanted in memory given malformed requests (look for some of my previous posts above).

Since we want to take advantage of incremental computation and Python's iterators we lose this, as we can happily move back and forth in parser states forever in the event a bug is encountered. This 500 is a inexact proxy for the maximum stack depth before we decide we are never going to be able to make a reduction.

So far it has never been triggered (it was put in after we debugged some bugs that WOULD have caused it to trigger). You can remove it if you decide maybe-busy-looping is better than an exception.

Also, it's possible to defeat this heuristic by having a ring-oscillation of sorts where different values are popped off and put back onto the LazyStream. This is a much more complex class of bug and also less likely given the architecture of the parser. A more sound way to accomplish a recursion-depth-limit would be to count the depth of our parser state, and perhaps we should do that instead, even if it is somewhat more complicated.

So in conclusion: this check is not there because "there are some known issues we haven't ironed out and we need a catch-all," but more like "we don't have any known issues, but would rather not occupy 100% cpu if there are any regressions following a similar pattern of ones we've seen previously"

comment:300 in reply to: ↑ 299 ; follow-up: Changed 7 years ago by jacob

Replying to fdr:

I authored this check. It's actually not to target any regression in particular, but I decided it would be a better option than -- in event of a parser bug -- a busy loop. The rationale is that the parser is conceptually a recursive descent parser with backtracking and that if written naively that would mean that parser bugs would manifest as a stack overflow error. The old parser had a bug like this where it would just be content to buffer as much as the stream as it wanted in memory given malformed requests (look for some of my previous posts above).

Ah, OK -- makes sense. So the only way this would get triggered would be a maliciously malformed request package, right?

I may just change the error message, then -- "malformed multipart request"?

comment:301 Changed 7 years ago by Hangya

  • Cc django@… added

comment:302 in reply to: ↑ 300 Changed 7 years ago by fdr

Replying to jacob:

Ah, OK -- makes sense. So the only way this would get triggered would be a maliciously malformed request package, right?

That's right. The malformed request has to be strange in a way not seen or thought of. It could also potentially come up as a very obscure correct payload.

I may just change the error message, then -- "malformed multipart request"?

Sure, that's would seem perfectly fine. But pay in mind the following: because it is triggered in a very exceptional case (we do advertise handling bad inputs in a graceful way) I'd suggest something more strongly worded to report a bug is called for so it can be squashed.

This check is not a fallback. It a last resort, and in an ideal world would never be triggered. It's just a hopefully better alternative that doing something very wrong forever.

comment:303 Changed 7 years ago by jacob

Since uploading patches is getting unwheidly, I've started working on this in a git branch: http://code.djangoproject.com/git/?p=django;a=shortlog;h=refs/heads/2070-streaming-uploads (check out from git://djangoproject.com/django and use the "2070-streaming-uploads" branch).

comment:304 Changed 7 years ago by ipse

dsalvetti@…, matthias@…, paltman@…, oliver@…, Maniac@…, nesh@…, aenor.realm@…, gary.wilson@…, serialx.net@…, wonlay@…, lurker86@…, shaun@…, antonio.mele@…, herbert.poul@…, gabor@…, koebbe@…, axiak@…, jm.bugtracking@…, moritz.angermann@…, django@…, brosner@…, root.lastnode@…, akaihola+djtick@…, sam@…, trey@…, klaus.blindert@…, ville@…, reza@…, registrations@…, calvin@…, works@…, andrewbadr.etc@…, uptimebox@…, drfarina@…, beau@…, giuliani.v@…, prigun@…, zerok@…, johannes.beigel@…, david@…, django@…, mocksoul@…, schlaber@…, danpoe@…, django@…

comment:305 Changed 7 years ago by anonymous

  • Cc django-2070@… added
  • Keywords sprstreaming intsept14 added; streaming sprintsept14 removed

comment:306 Changed 7 years ago by anonymous

  • Keywords streaming sprintsept14 added; sprstreaming intsept14 removed

comment:307 Changed 7 years ago by ipse

  • Cc ipse.reg@… removed

Changed 7 years ago by jacob

"release candidate" patch

comment:308 Changed 7 years ago by jacob

  • Owner changed from axiak to jacob
  • Status changed from assigned to new

I've uploaded what I think is a finished patch. It's also available from git: http://code.djangoproject.com/git/?p=django;a=shortlog;h=refs/heads/2070-streaming-uploads (clone from git://djangoproject.com/django and checkout the 2070-streaming-uploads branch).

Public testing is needed, but once its been kicked around some this'll get merged to trunk.

comment:309 Changed 7 years ago by clint

  • Cc clintecker@… added

comment:310 Changed 7 years ago by paltman

  • Cc paltman@… removed

comment:311 Changed 7 years ago by jacob

  • Resolution set to fixed
  • Status changed from new to closed

(In [7814]) Fixed #2070: refactored Django's file upload capabilities.

A description of the new features can be found in the new upload handling documentation; the executive summary is that Django will now happily handle uploads of large files without issues.

This changes the representation of uploaded files from dictionaries to bona fide objects; see BackwardsIncompatibleChanges for details.

comment:312 Changed 4 years ago by jacob

  • milestone 1.0 beta deleted

Milestone 1.0 beta deleted

comment:313 Changed 3 years ago by anonymous

  • Easy pickings unset
  • UI/UX unset
Note: See TracTickets for help on using tickets.
Back to Top