#2070 closed enhancement (fixed)
Large streaming uploads
Reported by: | Owned by: | Jacob | |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | normal | Keywords: | streaming, upload, large, sprintsept14, feature |
Cc: | dsalvetti@…, matthias@…, oliver@…, Maniac@…, nesh@…, aenor.realm@…, gary.wilson@…, serialx.net@…, wonlay@…, lurker86@…, shaun@…, antonio.mele@…, herbert.poul@…, gabor@…, koebbe@…, axiak@…, jm.bugtracking@…, moritz.angermann@…, django@…, brosner@…, root.lastnode@…, akaihola+djtick@…, sam@…, trey@…, klaus.blindert@…, ville@…, reza@…, registrations@…, calvin@…, works@…, andrewbadr.etc@…, uptimebox@…, drfarina@…, beau@…, giuliani.v@…, prigun@…, zerok@…, johannes.beigel@…, david@…, django@…, mocksoul@…, schlaber@…, danpoe@…, django@…, django-2070@…, clintecker@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Based on code from ticket 1448.
Test for large file upload
from django.db import models # Create your models here. class FileList(models.Model): name = models.CharField(maxlength=255) email = models.EmailField() class Admin: pass class AFile(models.Model): descr = models.CharField(maxlength=255,core=True) file = models.FileField(upload_to='files') inlist = models.ForeignKey(FileList,edit_inline=models.STACKED)
And two 500 megabyte files for upload.
Works with the patch with +20 mb above average memory and minimal cpu load usage by the httpd thread, file upload successful.
Without the patch httpd rages to +140 mb above average memory usage and over 60% cpu usage and fails miserably in the end.
Attachments (64)
Change History (376)
comment:1 by , 19 years ago
Severity: | normal → major |
---|
comment:2 by , 19 years ago
priority: | high → normal |
---|---|
Resolution: | → duplicate |
Severity: | major → normal |
Status: | new → closed |
This is a duplicate of #1484.
comment:3 by , 19 years ago
Resolution: | duplicate |
---|---|
Status: | closed → reopened |
Or is it? Leaving it open until someone who knows more about it can sort this out.
comment:4 by , 19 years ago
Detailed description of patch.
Based of the #1448 patch that i experienced trouble with using edit inline and large files.
Changes from trunk:
- Added STREAMING_UPLOADS=False to global_settings
- Added parse_streaming_file_upload it http/__init__.py
- Made changes to modpython and wsgi in handlers to use parse_streaming_file_upload
- Subclassed cgi.FieldStorage
- overriding of make_file to use NamedTemporaryFile
- overriding of bouandry parsers to make readline read in 64k chunks to avoid reading extremely long lines into memory
- Subclassed rfc822.Message
- overriding of bouandry parsers to make readline read in 64k chunks to avoid reading extremely long lines into memory
- Made FileField validate upload using tempfile size
- Made save_file in db/models/fields/__init__.py pass temporary file name to _save_FIELD_file instead of file contents
- Made _save_FIELD_file copy temporary file instead of write file from memory
comment:5 by , 19 years ago
comment:6 by , 19 years ago
This patch uses the rfc822 module. The Python docs says the following about rfc822:
Deprecated since release 2.3. The email package should be used in preference to the rfc822 module. This module is present only to maintain backward compatibility.
comment:7 by , 19 years ago
Type: | defect |
---|
comment:8 by , 19 years ago
Type: | → defect |
---|
The problem with the email package is that it reads the entire file into memory. Django supports python 2.3 anyway.
comment:9 by , 19 years ago
to use monitor from a view
def progress_view(request): import pickle sessionid = request.COOKIES.get('sessionid', None) progress = pickle.load(open('/tmp/'+sessionid, 'r'))
Accessible variables
progress.fullength - full request byte lenght
progress.at_status - current bytes read
progress.at_file - current file read
progress.file_status - current file bytes read
comment:10 by , 19 years ago
Had a problem on FreeBSD with my /tmp being on a seperate filesystem: os.rename threw an "Invalid cross-device link" error.
The fix was to edit django/db/models/base.py, "import shutils" at the top and change the one instance of "os.rename" to "shutils.move". Works fine now... shutils came default with python2.4 I don't know about other releases, or if it works on other OSs
follow-up: 107 comment:11 by , 19 years ago
In reference to bnewbold's comment above, it's the shutil module, and it looks like it's been around since at least Python 2.3. (Django supports Python 2.3 and up.)
comment:12 by , 19 years ago
to make it work on windows make the following changes in UploadStateKeeper class:
class UploadStateKeeper: def __init__(self, identifier, fullength=None): self.identifier = identifier if fullength: self.state = UploadState(self.identifier, fullength) else: import pickle self.state = pickle.load(self.get_status_path(), 'r') def get_status_path(self): from tempfile import gettempdir return os.path.join(gettempdir(), self.identifier) def set_status(self, filename, addlen): self.state.set_status(filename, addlen) import pickle pickle.dump(self.state,open(self.get_status_path(), 'w')) def get_state(self): return self.state
comment:13 by , 18 years ago
Can anyone tell me the differences between the latest patch on this ticket and the latest one on #1484? Are these two tickets/patches duplicates or complements?
comment:15 by , 18 years ago
Sorry if I'm being dense, but does that mean that #1484 should be closed in favor of this one? I'm trying to figure out which patch(es) need to be applied here.
comment:16 by , 18 years ago
Keywords: | timezone time zone UTC added |
---|
In my opinion, yes #1484 should be closed. The only patch you have to apply is the latest one in this ticket.
It may help to svn up -r 3358 before you apply the patch, or you can try at the current HEAD and see what goes.
comment:18 by , 18 years ago
Owner: | changed from | to
---|---|
Status: | reopened → new |
Right, I've finally had a chance to play with this patch. It's a big one, so I want to make sure I understand what's going on before I check it in. So apologies if I'm being dense, but I gotta understand this stuff:
- I'm confused about why you needed to override
Message.readline
} just to change the argument toself.fp.readline()
(django/http/init.py, line 105). Is the Python implementation broken in some way? What happens if you take that bit out?
- Pretty much the same question for the FieldStorage subclass. This is a *lot* of duplicated code from the builtin cgi module, and it gives me pause. Can you explain why you need to do this (and perhaps think about if there's another way?)
- The upload status info stored in /tmp is pretty awesome. However, it doesn't seem to work all the time for me; while uploading a big file if I keep cat'ing that /tmp file it's often just empty. Seems something keeps emptying it out? Also, using the session ID seems like it would cause issues if I tried to upload multiple files at the same time (a very real possibility if I'm uploading really big-ass things).
- Related to that, setting
os.envrion['SESSIONID']
(done in both the WSGI and the mod_python handler) seems very leaky. We've had bugs before with this type of env monkeybusiness... Could you use a threadlocal instead?
I should end, however, by saying that the behavior of this patch rocks. Being able to upload a 2 GB file without locking my server is super-nice, and the status stuff absolutely kicks ass (I think I already said that). I'm pretty sure I can clean up all the stuff I just bitched about, but I've got a pretty long todo.txt and if you can take care of these problems and/or answer my questions, I'll check this in PDQ.
comment:19 by , 18 years ago
Status: | new → assigned |
---|
comment:20 by , 18 years ago
- If that bit is taken out memory usage soars because most browsers do not use newlines when sending files so it actually reads to EOF. Is it possible to override self.fp.readline() to call self.fp.readline(64000)? Then most of this code would be removed.
- The upload status is a bit tricy, I need to override FieldStorage to store the status. But since FieldStorage is not a extended class of object there is no super(FieldStorage, self).read_lines_to_outerboundary() causing repetition of code.
- Don't know how multiple uploads can be solved since it would require multiple id's, maybe use sessions instead if possible. Never experienced the emptiying if the temp file. Does this happen with both mod_python and wsgi?
- About setting os.environ, the mod_python-handler does not set it, but wsgi does, need to fix that one instead of messing with threadlocal , this may be why the file is emptied if it only happens with wsgi.
comment:21 by , 18 years ago
Cc: | added |
---|
comment:22 by , 18 years ago
Jacob, after reading your questions I'd like to share here my knowledge as an author of original patch in #1484.
- About readline(N). Actually browsers do send newlines when sending binary files because newlines (byte 0x0A) are present in binary files also. In theory 0x0A in a binary body can be rare enough for readline to read very large chunks in memory. However in practice 0x0A are present often enough (I'm talking from the experience of two services working mainly with big jpegs, mp3 and zips). This is, btw, why my original patch in #1484 didn't include the boundary in readline. The downside of replacing readlines in FieldStorage's code with readline(N) is that the base class doesn't provide this for overriding so you have to actually rewrite one-to-one pretty large chunks of code.
- To my liking storing upload info in files is not very nice. I mentioned my idea in django-developers about this that it's better be a hook for the user code (in middleware style) that is called from the handler accepting uploaded data. This is more flexible because a user can store this progress not in /tmp but wherever he likes and do more interesting stuff then just measuring progress. For example closing the upload if the file is too big for a server.
- One more thing that we've already discussed with Oyvind in an old ticket that his patch alters both HTTP handlers and admin app. The old patch was only for HTTP and my point is that these things should be separate because it's much simpler to review and apply.
Hope my comment was useful!
comment:23 by , 18 years ago
One more nit... Oyvin what was the reason to use rfc822? It's a new thing in your variant of the old patch and I just don't get why it's needed :-)
comment:24 by , 18 years ago
- When uploading two 500mb iso's, it broke down constantly with a memoryerror when not setting a buffer size, everything worked fine when a bufsize was added.
- About upload storage I do agree about using a hook instead.
- Rfc822 is still used by FieldStorage, now hidden from view. I just don not think of setting bufsize in make_file instead of passing it to readline (if that works). Have not had time to test yet.
- Agree that reviewing is easier. But it is nice for easy testing via the admin to see why django is the right tool for the massive job. :)
comment:25 by , 18 years ago
Ignore my patch 3358-streaming-file_with_settings_and_progressmonitor.3.diff, adding bufsize does not affect readline. Also it seems chunked reads are required for storing the state. Don't see any easy way to make this less code yet. I'll add a new patch with just the WSGI fix.
comment:26 by , 18 years ago
I was planning to make a middleware for a progressmonitor hook, but not quite sure when middleware is applied and how to pass this hook from the middleware to FieldStorage. Any help to clarify/fix this would be great.
Also if somenoe could test this successfully using WSGI it would be great.
follow-up: 125 comment:27 by , 18 years ago
My idea of a hook is not fit into current middleware, I just wanted to retain the style. So to clarify I see it like this.
This can be a separate set of hooks similar to MIDDLEWARE_CLASSES or it can be done in middleware classes themselves. To do the latter we will specifiy that in addition to process_request, _response, _view and _exception methods a middleware can define a new kind of method: process_upload(request, data).
parse_file_upload then would search for these middleware methods just as http requests do in modpython.py and wsgi.py. The list of methods is passed to a FieldStorage decendant that will call them each time it's about to write a chunk of data to the target stream.
This is it... Sorry I can't come with the actual code: pretty busy rolling out my current project.
comment:28 by , 18 years ago
I figured the last attachment was supposed to be a diff and tried to upload it with the right extension. Unfortunately, '.py' got added and now I don't know how to get rid of the attachment.
Sorry for the clutter.
comment:29 by , 18 years ago
Seems wsgi can't do streaming uploads this way, but ligthttpd 1.5 will chunk the uploads and has a mod_uploadprogress that serializes to JSON. Will try adapt this patch to serialize the same way so it wont matter if you use this streaming uploads patch or mod_uploadprogress.
comment:30 by , 18 years ago
Why do you say wsgi can't do this? It gives environwsgi.input to the handler that is a file-like object with uploading data. Handler then can read it in chunks (it actually does this in parse_file_upload).
comment:31 by , 18 years ago
WSGI may, but it seems like lighthttpd does not pass anything to wsgi before the request is done. I may be wrong, but I have no way of debugging properly.
comment:32 by , 18 years ago
Bug in last patch, did not manage to override readline to use a buffer size. If this cannot be done, code must be repeated.
comment:33 by , 18 years ago
Wsgi users using lighttpd should use mod_uploadprogress instead. There is some kind of delay before the upload state is updated using wsgi, the same does not happen on mod_python. Is this a issue with my code, the fastcgi/scgi spec or lighttpd's implementation? Can someone test this with fastcgi/scgi on apache?
comment:34 by , 18 years ago
GET identifer for upload state.
http://foo.com?bar saves json data in /tmp/bar
For javascript, making a /progress/ url and view see http://blog.lighttpd.net/articles/2006/08/01/mod_uploadprogress-is-back
Makes it easy to add a admin js option to a model to enable upload state in admin.
comment:35 by , 18 years ago
I just provided a patch which steals liberally from Oyvind, but decouples the progress monitor from the rest of the file upload behavior.
In a nutshell, if you don't want to read uploads into memory, you set request.save_file_class to the class of the kind of file-like object you'd like uploads routed to. A sample middleware class is provided which uses a temp file.
I haven't looked at Oyvind's latest code, but my guess is that we could easily move the upload monitor from the guts of Django by providing a file-like object class that keeps track of how much stuff it has received from the POST data stream.
Comments welcome. (But be nice; it's my first patch.)
comment:36 by , 18 years ago
To toddobryan, I made some fatal mistakes before 3358-streaming-file_with_settings_and_progressmonitor_fixed.diff + the wsgi fix, see this patch, you may have fixed it but i'm not sure. the latest patch I would say is really important for multiple uploads. There are some more upload progress issues, the json encoder gives two value sets but i can't find out why. Posting my latest diff now.
Btw I like one more developer helping me.
comment:37 by , 18 years ago
OK, you don't have to be very nice. It turns out that I missed perhaps the most important thing, which is wrapping the stream from the POST in some kind of object which prevents lines from being too long and filling up memory.
Oyvind's PostStream seems like the kind of thing to do, but I'd suggest wrapping the stream before we pass it into the FieldStorage object, not inside. I was misled by the fact that the read_binary() method inside there has a fixed 8k buffer.
comment:39 by , 18 years ago
If you could concentrate on getting your progress monitor to work from a file-like object instead of FieldStorage, I'll integrate the PostStream idea into my code so that the readline()s in the FieldStorage class don't read too much.
comment:40 by , 18 years ago
Thinking of making the upload monitor as a hook in streaming_upload_middleware.
Then make a upload monitor middleware that hooks onto streaming_upload_middleware by loading before streaming_upload_middleware.
Upload monitor will act exacly like mod_uploadprogress so wsgi users just load streaming_upload_middleware.
comment:41 by , 18 years ago
By the way, when you guys think you've addressed all the problems and simplified the code as much as you can, etc, just add a comment pointing to the right patch that needs another review (and mention that you think it's "done").
comment:42 by , 18 years ago
To test 'svn up -r 3518', apply patch and add:
django.middleware.upload.UploadStateMiddleware and django.middleware.upload.StreamingUploadMiddleware to MIDDLEWARE_CLASSES.
Uploadstate test currently builtin to admin change form. 'tail /tmp/test123' for upload state as json.
comment:43 by , 18 years ago
Regular uploads seem to fail if the middlewares are disabled. Backtrace:
Traceback (most recent call last): File "/opt/local/lib/python2.4/site-packages/django/core/servers/basehttp.py", line 272, in run self.result = application(self.environ, self.start_response) File "/opt/local/lib/python2.4/site-packages/django/core/servers/basehttp.py", line 611, in __call__ return self.application(environ, start_response) File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 158, in __call__ response = self.get_response(request.path, request) File "/opt/local/lib/python2.4/site-packages/django/core/handlers/base.py", line 102, in get_response return self.get_technical_error_response(request) File "/opt/local/lib/python2.4/site-packages/django/core/handlers/base.py", line 134, in get_technical_error_response return debug.technical_500_response(request, *sys.exc_info()) File "/opt/local/lib/python2.4/site-packages/django/views/debug.py", line 131, in technical_500_response return HttpResponseServerError(t.render(c), mimetype='text/html') File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 155, in render return self.nodelist.render(context) File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 688, in render bits.append(self.render_node(node, context)) File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 716, in render_node raise wrapped TemplateSyntaxError: Caught an exception while rendering: 'NoneType' object has no attribute 'items' Original Traceback (most recent call last): File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 706, in render_node result = node.render(context) File "/opt/local/lib/python2.4/site-packages/django/template/defaulttags.py", line 189, in render value = bool_expr.resolve(context, True) File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 548, in resolve obj = resolve_variable(self.var, context) File "/opt/local/lib/python2.4/site-packages/django/template/__init__.py", line 657, in resolve_variable raise VariableDoesNotExist, "Failed lookup for key [%s] in %r" % (bits[0], current) # missing attribute File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 61, in __repr__ return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \ File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 97, in _get_post self._load_post_and_files() File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 75, in _load_post_and_files self._post, self._files = http.parse_file_upload(self) File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 83, in parse_file_upload return default_parse_file_upload(req) File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 50, in default_parse_file_upload raw_message = '\r\n'.join(['%s:%s' % pair for pair in req.header_dict.items()]) AttributeError: 'NoneType' object has no attribute 'items'
comment:44 by , 18 years ago
Cc: | added |
---|
comment:45 by , 18 years ago
Cc: | added |
---|
comment:46 by , 18 years ago
This is anecdotal evidence at best, but rapid testing of this patch seems to indicate it does not work with the development server.
I basically applied it, updated to head, added the 2 middlewares, tried one of my upload forms, and it just sat there forever, never reaching the Django view code.
comment:47 by , 18 years ago
I would not expect this patch to work with the dev server, since it is a single thread you cannot do multiple requests at the same time.
comment:48 by , 18 years ago
I may be mistaken about my last comment, but the dev server uses the wsgi handler. So it may be the delay problem, or that stdin is not passed before the request is done. So since wsgi using lighttpd works, the dev server probably handle things differently. Don't really know alot about the dev server.
comment:49 by , 18 years ago
Cc: | added |
---|
comment:50 by , 18 years ago
This doesn't work for me at all. With Opera 9 and Firefox 1.5, I get KeyError on X-Progress-Id, which is supposed to be HTTP_X_PROGRESS_ID? (Webserver is Lighttpd, distro Debian unstable)
Also, isn't this line a bit insecure, given the get_temp_file does an "os.path.join":
content = open(get_temp_file(request.header_dict['X-Progress-Id']), 'r').read()
I've replaced my code to use cache framework instead. But mind you need server-wide cache, per process will probably not be enough.
And then, you probably don't want to open a new TCP connection every time you make a xml request, since this will effectively DOS your server.
I was also having problems with JS - the browser did not want to do a POST at all. It started polling server for progress, but it was not posting at all.
comment:51 by , 18 years ago
Lighttpd has a delay before it saves data and progress info.
Supposed to be HTTP_X_PROGRESS_ID?
I did it this way to be compatible with mod_uploadprogress that you should use instead.
Is there another way than opening another tcp connection using xmlhttp?
comment:52 by , 18 years ago
How much should that delay usually be? Since there is no POST for Django, progress bar does not work. This probably means I need to use mod_uploadprogress to get it working, but the new mod_uploadprogress has not been released yet officially.
Yes, HTTP_X_PROGRESS_ID. When I printed contents of request.header_dict I could see a key named HTTP_X_PROGRESS_ID which corresponds to X-Progress-Id.
Sure, one could open a TCP connection and server periodically sends info and client reads from it. But I see that you aim for mod_uploadprogress compatibility.
by , 18 years ago
Attachment: | 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id.diff added |
---|
Now using X_PROGRESS_ID instead of X-Progress-Id, accepts any lower/uppercase/ - / _ /prefix variant
comment:54 by , 18 years ago
Keywords: | streaming upload large added; timezone time zone UTC removed |
---|
The patch does not work with Apache 2 + mod_fcgid.
I'm running on Mac OS X 10.4.7 with a pure MacPorts installation (save Django and flup -- both are a svn HEAD).
Apache 2 configuration:
{{{<IfModule mod_fcgid.c>
AddHandler fcgid-script .fcgi
IdleTimeout 600
ProcessLifeTime 3600
MaxProcessCount 8
DefaultMinClassProcessCount 3
DefaultMaxClassProcessCount 3
IPCConnectTimeout 8
IPCCommTimeout 48
</IfModule>
ScriptAlias /myapp "cgi-bin/myapp.fcgi/"}}}
myapp.fcgi:
{{{#!/opt/local/bin/python
# -*- coding: utf-8 -*-
import sys, os
# Add a custom Python path.
sys.path.insert(0, "/opt/local/www")
# Switch to the directory of your project. (Optional.)
os.chdir("/opt/local/www/myapp")
# Set the DJANGO_SETTINGS_MODULE environment variable.
os.environDJANGO_SETTINGS_MODULE = "myapp.settings"
from django.core.servers.fastcgi import runfastcgi
runfastcgi(["method=threaded", "daemonize=false"])}}}
comment:55 by , 18 years ago
Repost with proper formatting :/
Apache 2 configuration:
<IfModule mod_fcgid.c> AddHandler fcgid-script .fcgi IdleTimeout 600 ProcessLifeTime 3600 MaxProcessCount 8 DefaultMinClassProcessCount 3 DefaultMaxClassProcessCount 3 IPCConnectTimeout 8 IPCCommTimeout 48 </IfModule> ScriptAlias /myapp "cgi-bin/myapp.fcgi/"
myapp.fcgi:
#!/opt/local/bin/python # -*- coding: utf-8 -*- import sys, os # Add a custom Python path. sys.path.insert(0, "/opt/local/www") # Switch to the directory of your project. (Optional.) os.chdir("/opt/local/www/myapp") # Set the DJANGO_SETTINGS_MODULE environment variable. os.environ['DJANGO_SETTINGS_MODULE'] = "myapp.settings" from django.core.servers.fastcgi import runfastcgi runfastcgi(["method=threaded", "daemonize=false"])
comment:56 by , 18 years ago
Cc: | added |
---|
comment:57 by , 18 years ago
I get the following error, again with my mod_fcgid setup, when using the admin interface with the progress status monitor js.
Unhandled exception in thread started by <bound method ThreadPool._worker of <flup.server.threadpool.ThreadPool object at 0x715270>>Traceback (most recent call last): File "/opt/local/www/data/flup/server/threadpool.py", line 103, in _worker job.run() File "/opt/local/www/data/flup/server/fcgi_base.py", line 642, in run self.process_input() File "/opt/local/www/data/flup/server/fcgi_base.py", line 678, in process_input self._do_params(rec) File "/opt/local/www/data/flup/server/fcgi_base.py", line 777, in _do_params self._start_request(req) File "/opt/local/www/data/flup/server/fcgi_base.py", line 761, in _start_request req.run() File "/opt/local/www/data/flup/server/fcgi_base.py", line 563, in run self.server.error(self) File "/opt/local/www/data/flup/server/fcgi_base.py", line 1157, in error req.stdout.write('Content-Type: text/html\r\n\r\n' + File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/cgitb.py", line 158, in html dump.append('%s = %s' % (name, pydoc.html.repr(value))) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 360, in repr return Repr.repr(self, object) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/repr.py", line 24, in repr return self.repr1(x, self.maxlevel) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 367, in repr1 return self.escape(cram(stripid(repr(x)), self.maxother)) File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 65, in __repr__ return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \ File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 101, in _get_post self._load_post_and_files() File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 79, in _load_post_and_files self._post, self._files = http.parse_file_upload(self) File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 82, in parse_file_upload return req.parse_file_upload(req) File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 83, in parse_streaming_file_upload upload_state = req.upload_state(req) File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 126, in __init__ self.state = {'size': int(req.header_dict.get('content-length')),TypeError: int() argument must be a string or a number
comment:58 by , 18 years ago
Something ate line breaks, apparently.
Unhandled exception in thread started by <bound method ThreadPool._worker of <flup.server.threadpool.ThreadPool object at 0x715270>> Traceback (most recent call last): File "/opt/local/www/data/flup/server/threadpool.py", line 103, in _worker job.run() File "/opt/local/www/data/flup/server/fcgi_base.py", line 642, in run self.process_input() File "/opt/local/www/data/flup/server/fcgi_base.py", line 678, in process_input self._do_params(rec) File "/opt/local/www/data/flup/server/fcgi_base.py", line 777, in _do_params self._start_request(req) File "/opt/local/www/data/flup/server/fcgi_base.py", line 761, in _start_request req.run() File "/opt/local/www/data/flup/server/fcgi_base.py", line 563, in run self.server.error(self) File "/opt/local/www/data/flup/server/fcgi_base.py", line 1157, in error req.stdout.write('Content-Type: text/html\r\n\r\n' + File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/cgitb.py", line 158, in html dump.append('%s = %s' % (name, pydoc.html.repr(value))) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 360, in repr return Repr.repr(self, object) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/repr.py", line 24, in repr return self.repr1(x, self.maxlevel) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pydoc.py", line 367, in repr1 return self.escape(cram(stripid(repr(x)), self.maxother)) File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 65, in __repr__ return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \ File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 101, in _get_post self._load_post_and_files() File "/opt/local/lib/python2.4/site-packages/django/core/handlers/wsgi.py", line 79, in _load_post_and_files self._post, self._files = http.parse_file_upload(self) File "/opt/local/lib/python2.4/site-packages/django/http/__init__.py", line 82, in parse_file_upload return req.parse_file_upload(req) File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 83, in parse_streaming_file_upload upload_state = req.upload_state(req) File "/opt/local/lib/python2.4/site-packages/django/middleware/upload.py", line 126, in __init__ self.state = {'size': int(req.header_dict.get('content-length')), TypeError: int() argument must be a string or a number
comment:59 by , 18 years ago
I have just tested the patch with Apache 2 and mod_python. It works correctly, however I had to make 2 small modifications because my Django application isn't running at the root context of my server.
Basically, the admin JavaScript code hardcodes a request to "/progress/". This needs to be changed appropriately for each install, e.g. in my case "/django/progress/".
Similarly, the UploadStateMiddleware checks that the request path is exactly "/progress/", which needs to be changed to match.
comment:60 by , 18 years ago
I svn up'ed, applied 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id.diff, deleted the Django directory under site-packages, reinstalled Django with 'python setup.py install' and restarted apache/modpython but Django appears to be behaving as before. My memory usage still climbs beyond what I'd expect and I don't see any files in /tmp like I'd expect from tempfile.NamedTemporaryFile().
I added these lines to my settings.py file:
STREAMING_UPLOADS=True
UPLOAD_BUFFER_SIZE=2048
Am I missing something?
Thanks!
comment:61 by , 18 years ago
It is 2 middleware now.
django.middleware.upload.UploadStateMiddleware and django.middleware.upload.StreamingUploadMiddleware
Order is important
comment:62 by , 18 years ago
Thanks for the info!
I changed middleware_classes to look like this:
MIDDLEWARE_CLASSES = ( 'django.middleware.common.CommonMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.middleware.doc.XViewMiddleware', 'django.middleware.upload.UploadStateMiddleware', 'django.middleware.upload.StreamingUploadMiddleware', )
I was able to upload a couple large test files and tempfiles were showing up as expected in /tmp, but I'm intermittently receiving this exception when uploading:
Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/base.py" in get_response 74. response = callback(request, *callback_args, **callback_kwargs) File "/home/misjxm/dp1/file/app/views.py" in createmessage 275. if request.POST: File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _get_post 51. self._load_post_and_files() File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _load_post_and_files 32. self._post, self._files = http.parse_file_upload(self) File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/http/__init__.py" in parse_file_upload 81. return req.parse_file_upload(req) File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/middleware/upload.py" in parse_streaming_file_upload 104. FILES.appendlist(key, FileDict({ File "/usr/lib/python2.4/cgi.py" in __getattr__ 540. raise AttributeError, name AttributeError at /file/compose/ tmp_name
Have you seen this before?
Thanks!
comment:63 by , 18 years ago
And I only seem to get that exception when uploading text files such as M3U playlists, bashrc files, etc.
comment:64 by , 18 years ago
Text files in general only cause the problem intermittently. I have one M3U file that consistantly causes the exception to be thrown. This is what local vars looks like when I upload that file:
name 'tmp_name' self FieldStorage('file1', 'klipschtest.m3u', "MP3\\2pac\\Greatest Hits (Disc 1)\\1-05 Hail Mary.mp3\r\nFLAC\\James Blunt\\Back to Bedlam\\03-Wisemen.flac\r\nMP3\\Jimi_Hendrix\\UNKNOWN\\Little wing (extended instrumental).mp3\r\nFLAC\\Led Zeppelin\\Remasters (disc II)\\04-No Quarter.flac\r\nFLAC\\Pink Floyd\\The Wall - Disc 1\\03-Another Brick in the Wall. Part 1.flac\r\nFLAC\\Pink Floyd\\The Wall - Disc 1\\04-The Happiest Days of Our Lives.flac\r\nFLAC\\Natalie Merchant\\Live In Concert\\05-Carnival.flac\r\nMP3\\Metallica\\...And Justice For All\\08-To Live Is To Die.mp3\r\nMP3\\Red Hot Chili Peppers\\Blood Sugar Sex Magik\\10-Blood Sugar Sex Magik.mp3\r\nFLAC\\Sade\\Lovers Rock\\03-King Of Sorrow.flac\r\nFLAC\\Stone, Joss\\The Soul Sessions\\05-Dirty Man.flac\r\nFLAC\\U.G.K\\Ridin' Dirty\\03-Murder.flac\r\nMP3\\Weezer\\Weezer (Green Album)\\04-Weezer _ Island In The Sun.mp3\r\nMP3\\Weezer\\Weezer (Green Album)\\03-Weezer _ Hash Pipe.mp3\r\n")
comment:65 by , 18 years ago
I'm getting an error about 'read only accepts 1 arguement (2 given)", from the following line in cgi.py:
self.fp.read(self.length)
comment:66 by , 18 years ago
This is the error I'm getting:
C:\web\fxedit>python manage.py runserver
Validating models...
0 errors found.
Django version 0.95, using settings 'fxedit.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CTRL-BREAK.
[16/Sep/2006 02:55:31] "GET /file/ HTTP/1.1" 200 1188
[16/Sep/2006 02:55:31] "GET /media/stylesheets/style.css HTTP/1.1" 304 0
[16/Sep/2006 02:55:31] "GET /media/stylesheets/print.css HTTP/1.1" 304 0
Traceback (most recent call last):
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\servers\
basehttp.py", line 272, in run
self.result = application(self.environ, self.start_response)
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\servers\
basehttp.py", line 615, in call
return self.application(environ, start_response)
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers
\wsgi.py", line 156, in call
response = self.get_response(request.path, request)
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers
\base.py", line 102, in get_response
return self.get_technical_error_response(request)
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers
\base.py", line 134, in get_technical_error_response
return debug.technical_500_response(request, *sys.exc_info())
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\views\debug.p
y", line 131, in technical_500_response
return HttpResponseServerError(t.render(c), mimetype='text/html')
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in
it.py", line 155, in render
return self.nodelist.render(context)
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in
it.py", line 688, in render
bits.append(self.render_node(node, context))
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in
it.py", line 716, in render_node
raise wrapped
TemplateSyntaxError: Caught an exception while rendering: read() takes exactly 1
argument (2 given)
Original Traceback (most recent call last):
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in
it.py", line 706, in render_node
result = node.render(context)
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\defa
ulttags.py", line 189, in render
value = bool_expr.resolve(context, True)
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in
it.py", line 548, in resolve
obj = resolve_variable(self.var, context)
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\template\in
it.py", line 657, in resolve_variable
raise VariableDoesNotExist, "Failed lookup for key [%s] in %r" % (bits[0], c
urrent) # missing attribute
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers
\wsgi.py", line 61, in repr
return '<WSGIRequest\nGET:%s,\nPOST:%s,\nCOOKIES:%s,\nMETA:%s>' % \
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers
\wsgi.py", line 97, in _get_post
self._load_post_and_files()
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\core\handlers
\wsgi.py", line 75, in _load_post_and_files
self._post, self._files = http.parse_file_upload(self)
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\http\init
.py", line 80, in parse_file_upload
return req.parse_file_upload(req)
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\middleware\up
load.py", line 87, in parse_streaming_file_upload
fs = FieldStorage(req.raw_request, environ=req.META, headers=req.header_dict
, upload_state=upload_state )
File "C:\Python24\lib\site-packages\django-0.95-py2.4.egg\django\middleware\up
load.py", line 71, in init
keep_blank_values=keep_blank_values, strict_parsing=strict_parsing)
File "C:\Python24\lib\cgi.py", line 524, in init
self.read_urlencoded()
File "C:\Python24\lib\cgi.py", line 629, in read_urlencoded
qs = self.fp.read(self.length)
TypeError: read() takes exactly 1 argument (2 given)
[16/Sep/2006 02:55:53] "POST /file/stats/ HTTP/1.1" 500 3721
by , 18 years ago
Attachment: | 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id_windowsfix.diff added |
---|
windows temporary file locking fix
comment:67 by , 18 years ago
I'm getting the same error as anonymous above (and am also using Windows).
comment:68 by , 18 years ago
Idea for fixing the m3u upload problem. May need a copy of this util.py to make it work with wsgi.
But I have almost given up on wsgi for this patch, uless anyone are able to shed som light on why wsgi does not like chunked reading from wsgi.input.
comment:69 by , 18 years ago
There is a problem using this patch with Nginx (and maybe some other webservers). Nginx doesn't pass content-type and content-length headers and the result of this is that FieldStorage can't parse request correctly and nothing is uploaded. A workaround I've found is modification of cgi.FieldStorage to use HTTP_CONTENT_TYPE and HTTP_CONTENT_LENGTH. But I don't think this is elegant solution of this problem. Does anyone have other suggestions on how to solve it ?
comment:70 by , 18 years ago
Worked for me(Apache mod_python). Working like charm. :) But heres some instructions..
- Do the patch (cd django; patch -Np0 < file.diff)
- Add two middlewares, modify the model like below:
class FileList(models.Model): name = models.CharField(maxlength=255) email = models.EmailField() class Admin: js = ['js/UploadProgress.js'] class AFile(models.Model): descr = models.CharField(maxlength=255,core=True) file = models.FileField(upload_to='files') inlist = models.ForeignKey(FileList,edit_inline=models.STACKED)
- You have progressbar in admin. :)
Submitting patch to the latest trunk.
comment:71 by , 18 years ago
Cc: | added |
---|
comment:72 by , 18 years ago
There are some issues with this patch. I have it working on my local Linux and Windows boxes. But i can't get to work it completely correct on my server with the same setup as my local linux box (python 2.5, latest Django).
The issue is that when uploading a file WSGIRequest . _load_post_and_files is called only after all data is uploaded. So no streaming occurs and no progress is displayed and I didn't find a solution for this yet..
comment:73 by , 18 years ago
[Cha0S] using lighttpd or apache, same lighttpd/apache version on both servers?
I found out lighttpd pre 1.4.0 supported streaming, so that has to be fixed before this patch will work on lighttpd.
comment:74 by , 18 years ago
This worked great on my system [Apache/2.0.58 (Win32) mod_python/3.2.8 Python/2.4.3]
Following up on Sung-Jin Hong's good instructions with a few tips for newbies like me:
- When you apply the patch, make sure you get django/middleware/upload.py. I found that when I did an SVN update to Rev 4065 and applied 4065-streaming_uploads_and_uploadprogress_middleware_x_progress_id_windowsfix.diff I did not get the file. I had to apply 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id_windowsfix.diff to get it.
- Make sure js/UploadProgress.js in accessible. The Admin page's path to it will be in the media directory, e.g. http://yoursite.com/media/js/UploadProgress.js. I had to copy UploadProgress.js from django/contrib/admin/media/js to my local media/js directory.
- Make sure the Apache httpd.conf file section for your site allows paths starting with "/progress", i.e. has a line like '<Location ~ "/(admin|shop|progress)">' in it, because the JS code is doing a "req.open("GET", "/progress/", 1)" repeatedly to track progress.
I find that the Upload progress box displays fine in Firefox and Opera, but it is at the very bottom left of the absolute page in IE6 (i.e. you need to scroll to the very bottom to see it). I'm still working on this problem.
Thank you to all who contributed this!
comment:75 by , 18 years ago
[Cha0S] using lighttpd or apache, same lighttpd/apache version on both servers? I found out lighttpd pre 1.4.0 supported streaming, so that has to be fixed before this patch will work on lighttpd.
I've tested with lighttpd and nginx web-servers. same versions on both servers. with apache it works fine on windows.
comment:76 by , 18 years ago
Component: | Core framework → Contrib apps |
---|---|
Type: | enhancement → defect |
Has anyone come up with a fix for the AttributeError exceptions when uploading text files? I'm still getting exceptions like this:
AttributeError at /file/compose/
tmp_name
Request Method: POST
Request URL: https://myserver/file/compose/
Exception Type: AttributeError
Exception Value: tmp_name
Exception Location: /usr/lib/python2.4/cgi.py in getattr, line 540
Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/base.py" in get_response
- response = callback(request, *callback_args, callback_kwargs)
File "/files/lib/file/app/views.py" in createmessage
- if request.POST:
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _get_post
- self._load_post_and_files()
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/core/handlers/modpython.py" in _load_post_and_files
- self._post, self._files = http.parse_file_upload(self)
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/http/init.py" in parse_file_upload
- return req.parse_file_upload(req)
File "/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/middleware/upload.py" in parse_streaming_file_upload
- FILES.appendlist(key, FileDict({
File "/usr/lib/python2.4/cgi.py" in getattr
- raise AttributeError, name
AttributeError at /file/compose/
tmp_name
comment:77 by , 18 years ago
Type: | defect → enhancement |
---|
oops, didn't mean to change the type on this ticket... changing back to enhancement.
comment:78 by , 18 years ago
Component: | Contrib apps → Core framework |
---|
comment:79 by , 18 years ago
Cc: | added |
---|
Hi, jacob
The 'AttributeError' occurred, because FieldStorge does not make_temp_file for small files, and the tmp_name is not defined.
The progress middleware has problem too. You will notice that your upload speed will down to a significant lower rate, if you use this middleware. The performance bottleneck is that, every single 'read' from request reads only a few bytes, even if you set a large buffer_size, and the serialization and save to disc of 'UploadState' is significant.
The attachments is my working copy of code.
comment:80 by , 18 years ago
Cc: | added |
---|
comment:81 by , 18 years ago
Needs documentation: | set |
---|---|
Patch needs improvement: | set |
Triage Stage: | Unreviewed → Accepted |
Can someone who has used this give us a progress report on this - which of the 3000 patches above are needed? can these be merged into one patch that works?
comment:82 by , 18 years ago
The one that i can vouch for is 3581-streaming_uploads_and_uploadprogress_middleware_x_progress_id.diff
But it has problems with m3u uploads's for some reason i can't figure out.
Was thinking about the possibility of using email.FeedParser instead of rfc822 but that is not available in python 2.3
This patch does not work with lighttpd, since lighttpd can't stream requests.
comment:83 by , 18 years ago
This patch does not work with lighttpd, since lighttpd can't stream requests.
This is not true. When Django works with Lighttpd over FastCGI lighttpd immediately offloads received data to Django process and it can stream it wherever it wants to. In fact I use one of my services that uploads zipped mp3 albums under lighttpd and streaming works (with my patch in #1484).
comment:84 by , 18 years ago
I think you are correct. I guess the problem with this patch and fcgi has to do with not doing blocking reads with a buffer size. But a buffer size is needed if the browser sends a big file without eol's, and for the upload progress.
comment:85 by , 18 years ago
I'm not sure I understand what you're saying... I was replying solely to a claim that lighttpd can't do streaming. It certainly does streaming to a FCGI app. Whatever the app does then (streaming by blocks, by eols or no streaming at all) is a whole another question.
comment:86 by , 18 years ago
Cc: | added |
---|
comment:87 by , 18 years ago
Making a attempt at using email.FeedParser
http://fivethreeo.dynalias.org/repos/streaming
login: public, password: public
comment:88 by , 18 years ago
Need to override FeedParser._parsegen to write the payload directly to a file instead of a in-memory string.
Probably involves copying FeedParser._parsegen and do some modificatioms.
comment:89 by , 18 years ago
I am pleased to announce, its alive!!
Just some minor work left ( validators, refactoring, bugfixing ) volunteers?
FeedParser should work with python2.3 if it included with django.
comment:90 by , 18 years ago
Cc: | removed |
---|
comment:91 by , 18 years ago
To get the most recent diff, now with validator
svn co http://fivethreeo.dynalias.org/repos/streaming
then
cd streaming
svn diff -r371
by , 18 years ago
Attachment: | modpyton-ok-needs-fcgi-testing.3.diff added |
---|
did not work without middleware, fixed
comment:94 by , 18 years ago
Tested on lighttpd/fastcgi without errors or memory hogging.
Should feedparser be included with django so python2.3 users can use this feature too?
The way the feedparser works there is no way to not duplicate the entire _parsegen function as I see it.
comment:95 by , 18 years ago
Cc: | added |
---|
comment:96 by , 18 years ago
Streaming file uploads, take #3
I would really like to see something like this committed to
trunk. However I think the previous patches has been too complicated
and tried to do too much. Lack of documentation and tests has probably
been a factor as well.
I have rewritten the file upload to make it simpler and added some
documentation and tests.
- Simplified parsing by using a specialized RFC 2388 parser. The
previous patches used different RFC 2822 parsers which were not suited
for the task. The new parser is very efficient both cpu and memory
wise. It uses mx.TextTools for boudary searches if available.
- Removed AJAX code and progress meter. While I think it is a good
thing I think it hinders the commitment of this patch. We should open
a new ticket for a progress meter in the Admin interface. The
X-Progress-ID header is still supported and a pickled progress file is
written if supplied.
- Added error handling.
- Added Documentation.
- Added tests.
Tested with mod_python and lighttpd.
You need to specify the FILE_UPLOAD_DIR setting to enable streaming (read forms doc).
Patch attached (4459-streaming-file-upload.diff)
follow-up: 99 comment:98 by , 18 years ago
A couple of comments from quick reading of 4459-streaming-file-upload.diff
- It streams uploads to files only if FILE_UPLOAD_DIR is set. But it looks that it stores files in temp directory anyway. What's the point of FILE_UPLOAD_DIR then?
- Instead of FILE_UPLOAD_DIR it would be nice to have a setting like MAX_FILE_UPLOAD_BUFFER with the maximum number of bytes stored in memory during an upload. And only if the file is bigger it is streamed to disc. This will have a nice effect of not touching disc when dealing with small files which is faster
- It looks like any Exception during the parsing is swallowed and returned as a dict of errors that is then stored in a POST data. This actually breaks the whole exceptions paradigm :-).
- All the save_FIELD_file now accept raw_field. This is backward-incompatible but worse it limits this method to work only with uploaded files while now save_FIELD_file can accept any content not matter how it was created in a program. I understand that just renaming streamed upload is a fast feature but I believe it deserves a new method. Like 'save_FIELD_from_file' or something...
follow-up: 101 comment:99 by , 18 years ago
Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:
A couple of comments from quick reading of 4459-streaming-file-upload.diff
- It streams uploads to files only if FILE_UPLOAD_DIR is set. But it looks that it stores files in temp directory anyway. What's the point of FILE_UPLOAD_DIR then?
A StringIO() buffer is used in that case so no files are used.
- Instead of FILE_UPLOAD_DIR it would be nice to have a setting like MAX_FILE_UPLOAD_BUFFER with the maximum number of bytes stored in memory during an upload. And only if the file is bigger it is streamed to disc. This will have a nice effect of not touching disc when dealing with small files which is faster
The problem is that you do not know in advance how large a file is. However you know how large the POST is in total so you could enable file streaming based on that. I did not want to create too many settings variables but I agree that it might be a good idea. Another option would be to have a setting where you name the form fields what you want to have streamed.
- It looks like any Exception during the parsing is swallowed and returned as a dict of errors that is then stored in a POST data. This actually breaks the whole exceptions paradigm :-).
Yes, I did that to avoid too many changes in wsgi.py/modpython.py since you have to take care not to try to read the input stream again in case of an exception. Meaning that I always wanted to return something. Bad choice maybe but I tried to make as few changes as possible so that the powers that be could start looking at the patch without too much confusion ;)
- All the save_FIELD_file now accept raw_field. This is backward-incompatible but worse it limits this method to work only with uploaded files while now save_FIELD_file can accept any content not matter how it was created in a program. I understand that just renaming streamed upload is a fast feature but I believe it deserves a new method. Like 'save_FIELD_from_file' or something...
Aren't those methods invoked automatically for you when you do .save()? I didnt think they were public but I dont understand exacly how they work so I take your word for it...
by , 18 years ago
Attachment: | 4459-streaming-file-upload.2.diff added |
---|
Added FILE_UPLOAD_MIN_SIZE (default 100kb) to define minimum request size for streaming to disk. Propage exeptions. I'm not too happy with the names of the settings anymore :/
comment:100 by , 18 years ago
(I've removed many attachments, as per a request by oyvind.saltvik@… on django-developers.)
comment:101 by , 18 years ago
The problem is that you do not know in advance how large a file is. However you know how large the POST is in total so you could enable file streaming based on that.
Yeah that what I meant... The whole point is to not make Django eat everything in memory so a settings topping this amount seems most logical to me.
I saw you made this setting. Here are couple of nits:
file_upload_min_size = getattr(settings, 'FILE_UPLOAD_MIN_SIZE', 100000)
- There is a convention in Django to put a setting with its default value into django/conf/global_settings.py. This keeps all defaults in one place and ensures that settings are always present so you don't have to fail-safe them with getattr, just use settings.SETTING_NAME
- FILE_UPLOAD_MIN_SIZE sounds confusing, like we don't allow users to upload small files :-). I still like an idea of calling it a buffer size somehow (and actually use it as a buffer size when reading chunks from input). What do you (and everyone) think?
- 100 KB is more like 100 * 1024. But I'd make it about 512 * 1024 or 1024 * 1024. This is small enough to not even show on memory stats but big enough to handle most profile photos uploaded in practice without touching disk most of the times.
Yes, I did that to avoid too many changes in wsgi.py/modpython.py since you have to take care not to try to read the input stream again in case of an exception.
Well... An exception ensures just this: the program won't run normally in case of errors, return values on the other hand are easy to forget to check. I see that you have removed swallowing the Exception. Why not MultiPartParserError then? As I understand this will happen only when the input is really malformed (like not created by a browser) so it's very little a user application can do about it. And I think it shouldn't. I think we should catch this exception in handlers/base.py and unconditionally return '400 Bad Request' on it.
Aren't those methods invoked automatically for you when you do .save()? I didnt think they were public but I dont understand exacly how they work so I take your word for it...
They are called from save(), yes. But they are also intended as a public API and documented (it's in model's API doc, too lazy to get a link :-) ).
follow-up: 105 comment:102 by , 18 years ago
Cc: | added |
---|
comment:103 by , 18 years ago
Cc: | added |
---|
comment:104 by , 18 years ago
Cc: | added; removed |
---|
comment:105 by , 18 years ago
Replying to Gábor Farkas <gabor@nekomancer.net>:
Sorry to clutter the thread.
Can this patch be used with the newforms framework, or does it only work with oldforms?
comment:106 by , 18 years ago
Can be used in newforms if you handle the incoming data, form_for_model has no automatic filefields yet.
comment:107 by , 18 years ago
Replying to adrian:
In reference to bnewbold's comment above, it's the shutil module, and it looks like it's been around since at least Python 2.3. (Django supports Python 2.3 and up.)
os.rename('old', 'new') does result in an 'invalid cross-link device' exception when 'old' and 'new' are on different filesystems. It doesn't make sense to do this, as once a large file has been streamed to disk, copying to another filesystem is a pointless additional overhead. However, the code could catch the error, and do a copy and then delete operation (and generate a warning that the system can be configured more efficiently).
Or it could use shutil as suggested by a previous poster.
comment:108 by , 18 years ago
Is it possible that this patch could be updated so that it works with the current source version? At the moment if one checks out revision 4459, patches it and then executes svn update, clashes arise at several points in the code.
comment:109 by , 18 years ago
Cc: | added |
---|
by , 18 years ago
Attachment: | 5065-streaming_file_upload_with_shutils.diff added |
---|
I've updated the patch to include the shutils
command and to work with [5065]. Please check to see if it works.
by , 18 years ago
Attachment: | 5065-streaming_file_upload_with_shutils_2.diff added |
---|
Works in [5065], renamed settings variable, uses global settings, defaults STREAMING_MIN_POST_SIZE to .5MB (please test though!)
comment:110 by , 18 years ago
Cc: | added |
---|
by , 18 years ago
Attachment: | 5070-streaming-file-upload.diff added |
---|
Updated to trunk, without changes
follow-up: 112 comment:111 by , 18 years ago
Axiak
But does this not limit streaming to gnu os'es by using shutils. Should it not try shutils first, dont import shutils at the top of the .py Then try rename, and finally just write from filecontent if that fails.
by , 18 years ago
Attachment: | 5078_streaming_file_upload_with_shutils_and_fallbacks.diff added |
---|
Usese shutils
, but falls back.
by , 18 years ago
Attachment: | 5078_streaming_file_upload_with_shutils_and_fallbacks_2.diff added |
---|
Usese shutils
, but falls back, this time it deletes the tmp file even when it falls back.
comment:112 by , 18 years ago
Replying to Øyvind Saltvik <oyvind@saltvik.no>:
Axiak
But does this not limit streaming to gnu os'es by using shutils. Should it not try shutils first, dont import shutils at the top of the .py Then try rename, and finally just write from filecontent if that fails.
I did not know it only worked in GNU os's. Also, AFAIK, there's no OS-independent way to copy files in python, and if file streaming was enabled, file_fieldcontent will not have the desired content. Thus, the only sure-fire way to make it work is to do a file.write file.read combination. Luckily it probably won't have to do this, since it will try both shutils
and
os.rename
before it gets to that point.
comment:113 by , 18 years ago
One more thing, the last fallback should read and write incrementally, to avoid reading the whole file before it is written to another.
by , 18 years ago
Attachment: | 5078-streaming_file_upload_with_shutils_and_chunked_fallbacks.diff added |
---|
This file uses python fallbacks, but using chunks to avoid loading the entire file into memory.
by , 18 years ago
Attachment: | 5079-streaming_file_upload_with_safe_file_move.diff added |
---|
Cleaned it up a bit. Moved file_move_safe into django.utils in case it should be used in future endeavors.
comment:114 by , 18 years ago
Needs documentation: | unset |
---|---|
Summary: | [patch] Large streaming uploads → Large streaming uploads |
Version: | → SVN |
This last patch seems clean. If people wanted to start testing, now would be a good time :). It works in [5079] and should work on all platforms while considering all the ideas posted here.
It also has documentation. I will not remove the `Patch needs improvement' flag to let others review this patch with a finer comb than I have.
Please post here if you have additional concerns, comments, or enhancements.
by , 18 years ago
Attachment: | 5089-streaming_file_upload_with_safe_file_move.diff added |
---|
Added missing else
by , 18 years ago
Attachment: | 5089-streaming_file_upload_with_safe_file_move.2.diff added |
---|
Forgot to add django/utils/file.py
follow-up: 124 comment:117 by , 18 years ago
Hi,
Is x-progress-bar is a header that come with apache only or it is a start header?
Is it work on fcgi, if not, can I change x-progress-bar to hidden form?
Thank.
comment:118 by , 18 years ago
It is just a header that has to be sent along with the request.
X-Progress-ID with a value
then FILE_UPLOAD_DIR + the value of the header X-Progress-ID is a pickled python object.
That can be used by any view and displayed using javascript.
comment:119 by , 18 years ago
Thank for your replying,
I have tested this code by modified old javascript (middleware enable). I never seen X-Progress-ID in header, so, a temp file with that id never created.
Can I have example usage?
Thank again
comment:120 by , 18 years ago
Actually, this patch does this wrong, the header is only used for getting progress status, the id should be the querystring.
comment:121 by , 18 years ago
As this site states:
http://search.cpan.org/~ceeshek/Apache2-UploadProgress-0.2/lib/Apache2/UploadProgress.pm#HANDLERS
We should be using multiple mechanisms to get this progress ID.
After I mull over this some more and understand it better, I'll make a new patch.
by , 18 years ago
Attachment: | 5099_patch_for_streaming_uploads.diff added |
---|
Uses multiple mechanisms for determining the progress id.
comment:122 by , 18 years ago
The patch has now been merged into [5099] and I added a new property to request
: progress_id
. This may simplify JavaScript Middleware (we should make a new ticket for that) as it only has to do request.progress_id
. If it's None
, there was no progress ID sent. Otherwise, there was.
comment:123 by , 18 years ago
This ticket is only for making sure file uploads work. I have created ticket #4165 which s specifically for the progress bar enhancements.
comment:124 by , 18 years ago
Replying to anonymous:
Hi,
Is x-progress-bar is a header that come with apache only or it is a start header?
Is it work on fcgi, if not, can I change x-progress-bar to hidden form?
Thank.
X-Progress-ID
is used to tell the server between loads "This is the same post". I'm actually not sure how the headers are supposed to be populated, only that we're supposed to "support them". After reading a perl module for quite some time, I've determined that they use javascript to append a random 32-digit hex at the end of their form action ("?progress_id=abcdef0123456789abcdef0123456789") right before the form submits.
If you do this, it will work.
NB: Using a hidden POST field will NOT work. Why? Because it's the POST upload we're trying to track!
by , 18 years ago
Attachment: | 5100_upload_request_meta.diff added |
---|
This isn't important enough to get its own property in request...request.META now contains 'UPLOAD_PROGRESS_ID'
by , 18 years ago
Attachment: | 5100_forgot_to_revert_base.diff added |
---|
I included a bit much in that last patch.
by , 18 years ago
Attachment: | 5100_file_upload_core.diff added |
---|
Meant to be cleaner, especially in light of #4165
by , 18 years ago
Attachment: | 5100_file_upload_core_with_middleware_hooks.diff added |
---|
Added middleware hooks
by , 18 years ago
Attachment: | 5100_file_upload_core_with_middleware_hooks_2.diff added |
---|
Added middleware hooks...this is better.
comment:125 by , 18 years ago
Replying to Maniac <Maniac@SoftwareManiacs.Org>:
My idea of a hook is not fit into current middleware, I just wanted to retain the style. So to clarify I see it like this.
This can be a separate set of hooks similar to MIDDLEWARE_CLASSES or it can be done in middleware classes themselves. To do the latter we will specifiy that in addition to process_request, _response, _view and _exception methods a middleware can define a new kind of method: process_upload(request, data).
parse_file_upload then would search for these middleware methods just as http requests do in modpython.py and wsgi.py. The list of methods is passed to a FieldStorage decendant that will call them each time it's about to write a chunk of data to the target stream.
This is it... Sorry I can't come with the actual code: pretty busy rolling out my current project.
After reviewing all the comments, this one struck me in particular. I have added a process_upload(UploadException)
method to MiddleWare.
I'm sure I might get heat for this recent addition, but I think it's the cleanest way to allow extensibility.
Essentially, you can create a descriptor
for the file_progress
attribute of request
. It gets treated like a dictionary so it will be passed things like {'received': 1414,'size': 100000
} and is expected to return it when its asked for. You are given an exception to raise if something should cancel the file upload, so I could imagine someone writing a descriptor that would compare received
and some target size and cancel the upload if it's more than that.
For more details, please look at django/http/file_descriptor.py
for the fallback descriptor. If you named that exact class process_upload
and stuck it in a middleware, it would be used (and would work). I'll come up with another example soon, and stick it in #4165.
by , 18 years ago
Attachment: | 5100_file_upload_core_with_middleware_hooks_3.diff added |
---|
Apparently I don't know how not to lose files.
by , 18 years ago
Attachment: | 5100_file_upload_core_with_middleware_hooks_4.diff added |
---|
Apparently I don't know how not to lose files.
by , 18 years ago
Attachment: | 5113_file_upload_core_with_middleware_hooks.diff added |
---|
Fixes WSGI bug.
by , 18 years ago
Attachment: | 5116_streaming_upload_fixed_middleware_append.diff added |
---|
Fixed bug where .append(0, func)
was called.
by , 18 years ago
Attachment: | 5116_streaming_upload_fixed_middleware_append_2.diff added |
---|
Fixed bug where .append(0, func)
was called with the files this time.
comment:126 by , 18 years ago
Break!
Traceback (most recent call last):
File "C:\Python25\lib\site-packages\django\core\servers\basehttp.py", line 272, in run
self.result = application(self.environ, self.start_response)
File "C:\Python25\lib\site-packages\django\core\servers\basehttp.py", line 614, in call
return self.application(environ, start_response)
File "C:\Python25\lib\site-packages\django\core\handlers\wsgi.py", line 207, in call
request = WSGIRequest(environ)
File "C:\Python25\lib\site-packages\django\core\handlers\wsgi.py", line 78, in init
self.METAUPLOAD_PROGRESS_ID = self._get_file_progress_id()
File "C:\Python25\lib\site-packages\django\core\handlers\wsgi.py", line 187, in _get_file_progress_id
self._req.args)
AttributeError: 'WSGIRequest' object has no attribute '_req'
by , 18 years ago
Attachment: | 5126_file_upload_wsgi_tested.diff added |
---|
Tested with WSGI and made a few changes.
follow-up: 131 comment:127 by , 18 years ago
Alright, I tested/tweaked the patch on a few different platforms. Here are my notes:
- Apache2/mod_python: Works well. Works with #4165 and generates pretty progress bar
- Apache2/mod_fastcgi: Works well. Works with #4165 and generates pretty progress bar
- Apache2/mod_scgi: Untested.
- LighTTPD/mod_fastcgi: Does NOT work. However, version 1.5 will have
mod_uploadprogress
which will provide a JSON interface just like the one in #4165. Thus, usingmod_upgradeprogress
to generate a progress bar should be a breeze. I don't have lighttpd 1.5 to test, though.
- manage.py development server: World well. Progress bar does not work AFAIK, the development server is singlethreaded and it will stay that way. The progress bar is an inherently multithreaded process (one to serve progress information, one to receive the uploaded information).
Let me know if you have anymore problems.
by , 18 years ago
Attachment: | 5126_file_uploads_latest_patch.diff added |
---|
Added 'state':'starting' to be more like mod_uploadprogress
.
follow-up: 129 comment:128 by , 18 years ago
Hi,
I have tested under window xp, apache 2.2.4, mod_python 3.3. When I upload a large file via admin interface.
A temp file with named by session generated by javascript client is created. However, a file with random name and '.upload' extension is also created. File stream is append to '.upload' version instead of session id one.
Any idea??
comment:129 by , 18 years ago
Replying to anonymous:
Hi,
I have tested under window xp, apache 2.2.4, mod_python 3.3. When I upload a large file via admin interface.
A temp file with named by session generated by javascript client is created. However, a file with random name and '.upload' extension is also created. File stream is append to '.upload' version instead of session id one.
Any idea??
Yes, this is how it is supposed to work, which is why there are hooks for middleware (see illustration middleware in #4165).
You see, we don't need to keep track of the actual file that's being uploaded, because it's always being uploaded (by the same thread, same loop, etc.). We do need to be able to look for the 'progress' of that file upload, which has to be stored in some persistent manner, keyed by the session id. Since the file upload code has no model, can't rely on cache, and does have a safe place to put temp files, it by default puts that state information in that location.
If you install the patch 5126_file_upload_contrib_app.diff
and add 'django.contrib.uploadprogress.middleware.FileProgressCached'
to your MIDDLEWARE_CLASSES
setting, the cache framework will be used, and no session_id file will be created.
Alternatively, you can use that patch and add 'django.contrib.uploadprogress.middleware.FileProgressDB'
to your MIDDLEWARE_CLASSES
and add 'django.contrib.uploadprogress'
to your INSTALLED_APPS
(and run a syncdb
...).
Or, you are welcome to look at the code in django.contrib.uploadprogress.middleware.FileProgressCached
and write your own method of storing this file progress information (it's in a dictionary format).
Sounds like it's working though, thanks for the feedback.
comment:130 by , 18 years ago
I should also explain the rationale for the naming of the files:
We use the X-Progress-ID
value for the name of the file that stores the progress because in subsequent requests we're going to need to get the file by that same value.
We use a (python) temporary file name for the actual file that's being uploaded because it guarantees uniqueness. Since X-Progress-ID
is provided by the client, we can't guarantee someone won't start auto-generating requests and cause a collision on the files themselves. I have thought of putting the X-Progress-ID
in the session. This can be done but it relies on the session app -- which is not a prereq for Django to work -- being activated and file uploading is a core feature.
It doesn't seem like too much of an issue if the progress tracking has a small chance of getting munged. The actual files, OTOH, would be a bit worrisome.
That being said, if you really don't like the naming, and have a better suggestion, by all means suggest it.
follow-up: 132 comment:131 by , 18 years ago
Replying to Michael Axiak <axiak@mit.edu>:
Alright, I tested/tweaked the patch on a few different platforms. Here are my notes:
...
My new report:
- Apache2/mod_python: Works well. Works with #4165 and generates pretty progress bar
- Apache2/mod_fastcgi: Works well. Works with #4165 and generates pretty progress bar
- Apache2/mod_scgi: Untested.
- LighTTPD/mod_fastcgi: I stand corrected (Thanks SmileyChris). It does help with LighTTPD inasmuch as it will make the interaction between lighttpd and python not eat all the memory and CPU. However, it will not show a progress bar (nor are other features in #4165 supported), as the streamed 'upload' only exists between lighttpd and python, not between the web client and the web server.
- manage.py development server: World well. Progress bar does not work AFAIK, the development server is singlethreaded and it will stay that way. The progress bar is an inherently multithreaded process (one to serve progress information, one to receive the uploaded information).
Also, when uploading a 4.5 GB file, the content length was reported incorrectly. This happened with both FastCGI and mod_python on apache2.
comment:132 by , 18 years ago
Replying to anonymous:
...
Also, when uploading a 4.5 GB file, the content length was reported incorrectly. This happened with both FastCGI and mod_python on apache2.
It turns out this happens when you don't have Large File support (on either the client or the server?).
by , 18 years ago
Attachment: | 5127_file_uploads_no_streaming_fixed.diff added |
---|
There was an error uploading large files with streaming turned off
comment:133 by , 18 years ago
I also tidied up the code a bit (just white space tweaking mostly).
I noticed that currently, you need to set an upload directory even if you are using a middleware which overrides the FileProgressDescriptor
of request
because of calls like this in http.__init__
:
if self._progress_filename: self._request.file_progress = {'state': 'done'}
I'm not sure if relying on _progress_filename
is the best way to do this.
comment:134 by , 18 years ago
Can we have a moratorium on uploading yet more changes to this ticket for a few days, please?
I've started reading through a few of the patches carefully with an eye to working out what parts look like they're ready to be committed. However, this isn't going to work if every day there is yet another slight tweak to something. It's going to take a few days to review all this carefully and think about all the issues raised in the comments, so everybody just take a deep breath and back away from the keyboards for a bit.
follow-up: 136 comment:135 by , 18 years ago
I've yet to test large files with the patch applied, but I've noticed my files are being created with 600 permissions instead of the 644 they received before patching. Not sure if it's just my setup, but I can't recall any other changes over the past day or so that would have affected it.
comment:136 by , 18 years ago
Replying to lukeman@gmail.com:
I've yet to test large files with the patch applied, but I've noticed my files are being created with 600 permissions instead of the 644 they received before patching. Not sure if it's just my setup, but I can't recall any other changes over the past day or so that would have affected it.
It should inherit the permissions from the temporary directory. I can't think of a better setup than this, but if people think this is unexpected/problematic maybe we can change it.
If people understand streaming files go to the temp folder then to the other folder, then I think inheriting the permissions from the temp folder would be expected behavior.
-Mike
follow-up: 138 comment:137 by , 18 years ago
I have tested multiple upload at the same time (difference browser instance). If there are more than two upload request, when a first request
is sent, it work properly. If I make a new call, an old stop immediately and a new one not create a session and temp file anymore.
Django is great in advance, please keep it simple and work in common task.
comment:138 by , 18 years ago
Replying to anonymous:
I have tested multiple upload at the same time (difference browser instance). If there are more than two upload request, when a first request
is sent, it work properly. If I make a new call, an old stop immediately and a new one not create a session and temp file anymore.
Django is great in advance, please keep it simple and work in common task.
Hi,
I have tested it with simultaneous file uploads and it worked fine. Could you please include more details? For instance, I don't think there would be any chance it would work in the development server. It may break in some FastCGI configurations, too. (The reason being that accepting multiple file uploads requires some multithreadedness.)
-Mike
comment:139 by , 18 years ago
Cc: | added |
---|
follow-up: 142 comment:140 by , 18 years ago
Hi
If I stop upload process by clicking a "Stop" button. It look like sever see a same upload request and never generate process id.
How to solve this problem?
Thank
comment:141 by , 18 years ago
Cc: | added |
---|
comment:142 by , 18 years ago
Replying to anonymous:
Hi
If I stop upload process by clicking a "Stop" button. It look like sever see a same upload request and never generate process id.
How to solve this problem?
Thank
I'm not sure why you'd be seeing this. If you're using #4165 then the javascript should create a new progress id on *every* upload. If you're not using #4165, then progress id is irrelevant. While testing I was able to stop and re-upload in FF 2, 1.5, IE 7, and IE 6 without a problem. Could you please describe your environment with more detail?
-Mike
by , 18 years ago
Attachment: | 5313_updated_file_progress.diff added |
---|
Removed some unneeded things. No file progress tracking by default.
follow-up: 144 comment:143 by , 18 years ago
I've cleaned the FileProgressDescriptor architecture a little bit. Three changes for this patch:
- using
request.__class__ = descriptor
rather than using a weird descriptor wrapper. - defaulting to a
NullFileProgressDescriptor
(which does nothing) versus one that stored the information in temp upload directory. - making
safe_file_move
do nothing if the source and destination are the same string.
If you want to track file upload progress you need to install the stuff from #4165. This ticket just contains the hooks to get #4165 there with just a minimal of contrib. This patch makes that even clearer by removing the last remnants of functional file progress tracking.
I *think* this is probably where we want to leave this ticket (in terms of what should be cut out of it, what should be put in). Although, I can see several other separation (take care of form stuff in another ticket, for instance).
It is worth noting (because someone had issue with this in IRC), that if you try to use this save_FIELD_file(name, content)
should now be given a request.FILE
item, rather than the actual content of the file.
For example:
object.save_video_file(request.FILE['video']['filename'], request.FILE['video']['content'])
is bad, since the mere invocation request.FILE['video']['content']
will load all the content into memory, exactly what we're trying to avoid.
The new version looks like:
object.save_video_file(request.FILE['video']['filename'], request.FILE['video'])
In reply to SmileyChris, we have to have an upload directory anyway, since the FileProgressDescriptor only changes how file *progress* is tracked, not the actual uploading of files (that is, it has to go into a directory, or else this entire thing is disabled). We could rename and clean up if people prefer it changed.
follow-up: 145 comment:144 by , 18 years ago
Replying to Michael Axiak <axiak@mit.edu>:
The new version looks like:
object.save_video_file(request.FILE['video']['filename'], request.FILE['video'])
Why not skip request.FILEvideofilename at all? This is provided by request.FILEvideo anyway.
IMHO having filename as an separate parameter makes only sense if this would be used somehow directly, but currently the filename/path where the file gets saved is changed by the field (field.get_directory_name() and field.get_filename(filename)).
comment:145 by , 18 years ago
Replying to anonymous:
Why not skip request.FILEvideofilename at all? This is provided by request.FILEvideo anyway.
IMHO having filename as an separate parameter makes only sense if this would be used somehow directly, but currently the filename/path where the file gets saved is changed by the field (field.get_directory_name() and field.get_filename(filename)).
Well, save_video_file() is meant to allow someone to save a file. Unless that file is specifically provided in a post (it's not necessarily, and in many of my applications it's not...generated images come to mind).
We *can* have it accept both sets of arguments, but then it's too nonorthogonal. Perhaps we should just create a new one? move_video_file()
??
by , 18 years ago
Attachment: | 5343_1_streaming_file_upload.diff added |
---|
Newer, cleaner version of file upload script
by , 18 years ago
Attachment: | 5343_streaming_file_upload_no_assert.diff added |
---|
Newer, cleaner version of file upload script (doh! no random commented assert)
by , 18 years ago
Attachment: | 5343_streaming_file_upload_best.diff added |
---|
Sorry about that -- this one is the better one.
comment:146 by , 18 years ago
A list of changes in this new version:
- I removed the file_progress descriptor handling and middleware -- if you need to track file progress, you can use
process_request
to achieve what you need.
- I cleaned up a lot of things, removed many catch-all
excepts
and replaced withexcept OSError
etc.
- Included cross-platform locking for saving/moving files manually (django/utils/file_locks.py)
- Made
move_file_safe(src,dest)
to care if a file already exists and throw an error if one does by default.
- Moved stuff out of
http/__init__.py
into various locations.
- In reply to SmileyChris' comment earlier, no longer uses
self._progress_file
to determine if a file is being tracked. Now usingself._track_progress
(a boolean)
save_field_file
can now optionally not take a filename (iffilename = None
). To avoid confusion, there is an additional function,instance.move_field_file
which takes one parameter: therequest.FILES
value. Code examples should usemove_field_file
to distinguish this new functionality.
- Stay tuned for updates to #4165 to reflect these and other changes!
Example of new move_video_file
Model instance function:
#from this foo.save_video_file(request.FILES['video']['filename'], request.FILES['video']) #to this foo.move_video_file(request.FILES['video'])
by , 18 years ago
Attachment: | 5343_cleaned_streaming_file_upload.diff added |
---|
It's amazing what Trac helps you see.
by , 18 years ago
Attachment: | 5343_cleaned_streaming_file_upload_tweaked.diff added |
---|
Made a subtle fix after testing with #4165.
follow-up: 148 comment:147 by , 18 years ago
def init(self, headers, input, request, file_upload_dir=None, streaming_min_post_size=None, chunk_size=1024*64):
parser = MultiPartParser(headers, input, request, file_upload_dir, streaming_min_post_size)
Is chunk_size is configurable? Do you hard code, aren't you.
Thank
comment:148 by , 18 years ago
Replying to anonymous:
def init(self, headers, input, request, file_upload_dir=None, streaming_min_post_size=None, chunk_size=1024*64):
parser = MultiPartParser(headers, input, request, file_upload_dir, streaming_min_post_size)
Is chunk_size is configurable? Do you hard code, aren't you.
Thank
Actually, that's a great point. I am going to change a few things (shift around save_FIELD_file
and move_FIELD_file
to make things more explicit) and on my list for the next upload would be to allow chunk sizes to be a settings variable. (I was actually discussing this with Øyvind in IRC, http://simon.bofh.ms/logger/django/2007/05/26/15/25 .)
Thanks for the comment, though. Feel free to post any other suggestions!
comment:149 by , 18 years ago
Works for me. Windows/Apache2.2.4/ModPython.
My page actually uploads 4 large files at once so I added 'filename' : self._filename to the file_progress dict created in multparser.py then I can display the current file in my javascript.
comment:150 by , 18 years ago
Cc: | added |
---|
comment:151 by , 18 years ago
5343_cleaned_streaming_file_upload_tweaked.diff works perfect for me (Linux/Apache 2.0.59/modPython). Uploading large zip files (10 to 100 MB each) to my photo gallery with no problems ... memory usage is dead straight at 65MB ... where as it explored to 600 MB per apache process before. Good work Michael!
comment:152 by , 18 years ago
111 def parse(self):
112 try:
113 self._parse()
114 finally:
115 if self._track_progress:
116 self._request.file_progress = {'state': 'done'}
117 return self._post, self._files
When user click 'stop' while uploading file, server return {'state':'done'} even file is not completely uploaded.
This cause unexpected behaviors under different browser. I think we should return actual file size and received size to make
client to be able to check if file is completely uploaded.
Thank
comment:153 by , 18 years ago
def parse(self):
try:
self._parse()
finally:
if self._progress_filename:
if self._received == self._size:
self._request.file_progress = {'received':self._received,
'size':self._size,
'state': 'done'}
else:
self._request.file_progress = {'received':self._received,
'size':self._size,
'state': 'abort'}
How about this?
comment:154 by , 18 years ago
Cc: | added |
---|
by , 18 years ago
Attachment: | 5343_cleaned_streaming_file_upload_tweaked2.diff added |
---|
Fixed microscopic bug in an error path
comment:155 by , 18 years ago
Cc: | added |
---|
comment:156 by , 18 years ago
Cc: | added |
---|
by , 18 years ago
Attachment: | 5722.2.diff added |
---|
Sorry, that last one patch also included #4165. This is the correct one.
by , 18 years ago
Attachment: | 5724_streaming_file_upload.diff added |
---|
updated to r5724 (previous patch was missing files too)
comment:157 by , 18 years ago
Cc: | added |
---|
comment:158 by , 18 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Reading over this, I think the latest version here looks good (5724_streaming_file_upload.diff). Is everyone happy with it?
comment:159 by , 18 years ago
Yes, I've been using it intensively for the last couple of days and it works beautifully.
comment:160 by , 18 years ago
I have been using this patch for a little while now and it does work very well. Haven't seen anything wrong with it.
comment:161 by , 18 years ago
I manually applied patch 5724... to the latest trunk version and noticed that there's a syntax error in line 254 of multipartparser.py. Instead of
except OSError, IOError:
it must be
except (OSError, IOError), e:
Besides that, the patch seems to work quite nicely and I hope it will be added to trunk ASAP.
follow-up: 163 comment:162 by , 18 years ago
Ran into a very small issue:
Files uploads that are streamed end up having 0600 permissions, while files that are uploaded normally have 0644 permissions. I can't put my finger on where this happens at a glance, will look into it tomorrow.
(In case anyone's wondering, I need read access to be able to scan newly uploaded files with clamav.)
follow-up: 166 comment:163 by , 18 years ago
Please read 136. This is standard Unix permissions. For example, if someone (heaven forbid) used PHP to upload, the files would inherit the permissions from the temporary upload directory.
-Mike
Replying to simonbun <simonbun@versea.be>:
Ran into a very small issue:
Files uploads that are streamed end up having 0600 permissions, while files that are uploaded normally have 0644 permissions. I can't put my finger on where this happens at a glance, will look into it tomorrow.
(In case anyone's wondering, I need read access to be able to scan newly uploaded files with clamav.)
comment:164 by , 18 years ago
Well the strange thing is my temporary dir and my uploads dir both have 777 permissions. I just checked again and the temporary .upload file has 600 permissions.
I'm using apache2 + mod_python + python2.5 if that would make a difference
comment:165 by , 18 years ago
Cc: | added |
---|
comment:166 by , 18 years ago
Replying to anonymous:
I'll second that. This one has bitten me too.
While setting permissions to 600 is sound, maybe a two liner on the documentation would help make that more obvious. It took me a while to understand what as going on.
Asides from that, thanks for the patch. I've been using it for a while (mod_python at webfaction's setup, post unicode merge) and it works well.
Cheers
Arthur
comment:167 by , 17 years ago
Cc: | added |
---|
comment:168 by , 17 years ago
Cc: | added |
---|
comment:169 by , 17 years ago
Cc: | added |
---|
comment:170 by , 17 years ago
Cc: | added |
---|
comment:171 by , 17 years ago
Cc: | added |
---|
comment:172 by , 17 years ago
Patch for r5820 does not work with forms.FileField and forms.ImageField, raw_field is now a StrAndUnicode object not a dict containing the raw data.
comment:174 by , 17 years ago
Keywords: | sprint added |
---|---|
Owner: | changed from | to
Status: | assigned → new |
Triage Stage: | Ready for checkin → Accepted |
I will take this on to as full extent as I can over the upcoming sprint.
If anyone has any more issues, please post them on here before Friday.
comment:175 by , 17 years ago
I would like to see the filename be part of the progress returned for when a form has multiple files:
self._request.file_progress = {'received': self._received, 'size': self._size, 'filename' : self._filename, 'state': 'uploading'}
comment:176 by , 17 years ago
The checks after an image upload (as seen in http://dpaste.com/hold/18468/) can be improved somewhat; at first glance it is entirely unclear what's supposed to happen, and the error message is repeated.
See http://dpaste.com/hold/19212/ (untested but looks OK).
comment:177 by , 17 years ago
Uploading empty files is treated as an error, why?
I would also like to see a file propgress tracker example implementation with unit tests and a stress test for uploading
large files.
On request I can provide an example of such a stress test programm.
follow-up: 179 comment:178 by , 17 years ago
I frankly fail to see the use case for uploading an empty file (as opposed to simply not uploading it in the first place).
If you have one, feel free to (a) tell us why you think it makes sense, (b) provide a patch which optionally allows that.
comment:179 by , 17 years ago
Replying to Matthias Urlichs <smurf@smurf.noris.de>:
I frankly fail to see the use case for uploading an empty file (as opposed to simply not uploading it in the first place).
There are several situations where one might want to upload empty files.
- It could be that one does not know the file is empty
- Maybe one actually needs to have an empty file, for example a ini.py in python indicates that a directory is a package
There is nothing special about an empty file, it is a file like any other file, it only contains a single character, the End Of File character. Why would you want to treat this file in any different way? If you do so it only means that your code is not generic enough.
One more note:
The reason that this ticket is languishing like this is that it is filled with tacit assumptions, unusual behavior (like the one I'm commenting on) and extra functionality that is not relevant to the actual problem it needs to solve first - streaming the uploads.
follow-up: 183 comment:180 by , 17 years ago
I'd argue that case 1 is a prime candidate for keeping the test, while case 2 can be handled by uploading a file with a comment in it.
In any case: if you think that's the reason, then please post either a list of (other) problems you have with this patch, or an improved patch.
Anythig else is not helpful. IMHO.
comment:181 by , 17 years ago
Keywords: | sprintsept14 feature added; sprint removed |
---|
comment:182 by , 17 years ago
Cc: | added |
---|
comment:183 by , 17 years ago
Replying to Matthias Urlichs <smurf@smurf.noris.de>:
while case 2 can be handled by uploading a file with a comment in it.
I would be willing to elaborate but this commment of yours indicates to me you have your mind set and don't actually want to understand the point that I was making: That is sometimes a file have to be empty.
You can't manually start adding bytes into files just because someone decided that uploading an empty file is an error. All you need is think about it for a second. As I said before an empty file is still a valid file, it has a name, a modification date ... even a size!. Maybe it is used to indicate a state, maybe it is log file that has yet to be written to etc. there are countless usecases that require the presence of a file that may not contain anything yet.
In your design if one were to simultaneously upload 10 files one of which is empty, the upload should fail once processing reaches the empty file ...
comment:184 by , 17 years ago
I do admit that you have a point; I admit that my response was a bit knee-jerk.
My core point, now that I think about it, is not that I'm dead set against uploading empty files. That test is easily removed, or made optional.
The real point is: What's holding up the change? Did axiak just not have enough time?
This is not the only patch in this bug tracker which fixes a real-world problem. We're carrying ~30 of these in our own repository and, frankly, I'd like that number to go down, one of these days. :-/
comment:185 by , 17 years ago
I know this is maybe a little late to chime in, but as opinion on this seems to be heavily divided as to whether empty files should be allowed, would it not just be best all-round to add a way to define whether empty files should be allowed or not?
Surely this wouldn't be too hard (I'd be willing to work on this if I get a few moments here-and-there).
Just a thought'''
comment:186 by , 17 years ago
I am going to have to go with +1 for not distinguishing between an empty file or a non-empty file.
Technically there is no difference between and empty and non-empty file so I don't see why django should distinguish between the two. It's definitely a user defined preference at that point, which should be possible after the file has been uploaded. Not wanting and empty file is an assumption, a popular one yes, but still an assumption about the upload.
comment:187 by , 17 years ago
Hey,
As well as Trey, I am also +1 for not making this distinction. If people need the validation, it belongs in another layer than the core http processing. (Perhaps the newforms filefield)
-Mike
comment:191 by , 17 years ago
Cc: | added |
---|
comment:192 by , 17 years ago
I'm getting
Ran 155 tests in 55.010s
FAILED (failures=1, errors=1)
when I run the tests on 5820 with 5820_streaming_file_upload.diff applied.
They look like
1) Traceback (most recent call last):
File "/usr/local/src/django/django-trunk.hg.5820+streaming/tests/modeltests/test_client/models.py", line 91, in test_post_file_view
response = self.client.post('/test_client/post_file_view/', post_data)
File "/usr/local/lib/python2.4/site-packages/django/test/client.py", line 222, in post
return self.request(r)
File "/usr/local/lib/python2.4/site-packages/django/core/handlers/base.py", line 77, in get_response
response = callback(request, *callback_args, callback_kwargs)
File "/usr/local/src/django/django-trunk.hg.5820+streaming/tests/modeltests/test_client/views.py", line 52, in post_file_view
c = Context({'file': request.FILESfile_file})
File "/usr/local/lib/python2.4/site-packages/django/utils/datastructures.py", line 143, in getitem
raise MultiValueDictKeyError, "Key %r not found in %r" % (key, self)
MultiValueDictKeyError: 'Key \'file_file\' not found in <MultiValueDict: {\'file\': [{\'content\': \'# coding: utf-8
n"""
n38. Testing usi
[...]
2) FAIL: test_simple_upload (regressiontests.test_client_regress.models.AssertFileUploadTests)
Traceback (most recent call last):
File "/usr/local/src/django/django-trunk.hg.5820+streaming/tests/regressiontests/test_client_regress/models.py", line 215, in test_simple_upload
self.assertEqual(response.status_code, 200)
AssertionError: 500 != 200
Am I doing something wrong here?
by , 17 years ago
Attachment: | 6469_streaming_file_upload.diff added |
---|
comment:193 by , 17 years ago
Uploaded new diff 6469_streaming_file_upload.diff, applying against trunk.
Currently one unit test is failing, traceback follows. There were two tests failing in the earlier version (5820), one of which was this one, but [560] helped me fix one of them on IRC.
[560] thinks this is an unicode error. If this version works for people, I'd like to hear about it. It would really nice to get this merged into trunk.
Thanks, Faheem.
*
FAIL: test_simple_upload (regressiontests.test_client_regress.models.FileUploadTests)
Traceback (most recent call last):
File "/usr/local/src/django/django-trunk.hg+mystreaming/tests/regressiontests/test_client_regress/models.py", line 244, in test_simple_upload
self.assertEqual(response.status_code, 200)
AssertionError: 500 != 200
Ran 205 tests in 85.939s
FAILED (failures=1)
follow-up: 195 comment:194 by , 17 years ago
Okay, so if I want to test this with a clean trunk, which of the above do I use? Trying to figure out if I need the middleware (if so which one), or just 6469 or what? I am needed to upload a 500M+ file on a weekly basis and would love to know HOW to use this.
follow-up: 196 comment:195 by , 17 years ago
Replying to anonymous:
Okay, so if I want to test this with a clean trunk, which of the above do I use? Trying to > figure out if I need the middleware (if so which one), or just 6469 or what? I am needed to > upload a 500M+ file on a weekly basis and would love to know HOW to use this.
6469_streaming_file_upload.diff should apply cleanly against trunk (hasn't been that long). Just tested it (patched against 6469 trunk) and it appears to work fine. You don't need anything else if you just want the streaming upload functionality. Note that apparently there are some issues with FileField/ImageField in forms/newforms, but I have no direct experience of this.
Would be interested to hear how it works for you.
Faheem.
comment:196 by , 17 years ago
Isn't that just the way. I wrote this post, re-re-rechecked out the trunk and applied the patch. Works marvelously. Combined with a crontab for file conversion and everything is working wonderfully.
Thanks!
Replying to Faheem Mitha <faheem@email.unc.edu>:
Replying to anonymous:
Okay, so if I want to test this with a clean trunk, which of the above do I use? Trying to > figure out if I need the middleware (if so which one), or just 6469 or what? I am needed to > upload a 500M+ file on a weekly basis and would love to know HOW to use this.
6469_streaming_file_upload.diff should apply cleanly against trunk (hasn't been that long). Just tested it (patched against 6469 trunk) and it appears to work fine. You don't need anything else if you just want the streaming upload functionality. Note that apparently there are some issues with FileField/ImageField in forms/newforms, but I have no direct experience of this.
Would be interested to hear how it works for you.
Faheem.
comment:197 by , 17 years ago
I noticed a problem with an older patch that seems to be still in 6469. Uploading a FORM with both a file and text INPUT and the input text containing umlauts or other non-ascii characters, you get an exception. I fixed it by replacing line 272 of multipartparser.py with self._post.appendlist(self._fieldname, unicode(self._field.getvalue(), "utf-8"))
.
by , 17 years ago
Attachment: | 6525_all_tests_pass.diff added |
---|
Fixed unicode in POST issues, added django.http.utils, moved str_to_unicode there
by , 17 years ago
Attachment: | 6603_all_tests_pass_uploadedfile_wrapper_fixed.diff added |
---|
Fixed a problem with UploadedFile wrapper and making sure content is not read in Fie/ImageField
comment:199 by , 17 years ago
Needs tests: | set |
---|
Everything seems to work now, just needs some form tests.
comment:200 by , 17 years ago
I've installed the latest trunk, applied this patch (6603_all_tests_pass_uploadedfile_wrapper_fixed.diff) and discovered two problems. Under mod_python 3.3.1 uploading 12Mb image results in apache consuming all available memory and dying. This is probably a mod_python error as it doesn't occur when using development server or mod_wsgi.
When I'm trying to upload 600Mb file under mod_wsgi, though, I get another problem:
Traceback (most recent call last): File "/var/lib/python-support/python2.5/django/core/handlers/base.py" in _real_get_response 81. response = callback(request, *callback_args, **callback_kwargs) File "/var/lib/python-support/python2.5/django/contrib/admin/views/decorators.py" in _checklogin 55. return view_func(request, *args, **kwargs) File "/var/lib/python-support/python2.5/django/views/decorators/cache.py" in _wrapped_view_func 39. response = view_func(request, *args, **kwargs) File "/var/lib/python-support/python2.5/django/contrib/admin/views/main.py" in change_stage 332. errors = manipulator.get_validation_errors(new_data) File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in get_validation_errors 61. errors.update(field.get_validation_errors(new_data)) File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in get_validation_errors 378. self.run_validator(new_data, validator) File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in run_validator 368. validator(new_data.get(self.field_name, ''), new_data) File "/var/lib/python-support/python2.5/django/oldforms/__init__.py" in isValidImage 713. validators.isValidImage(field_data, all_data) File "/var/lib/python-support/python2.5/django/core/validators.py" in isValidImage 180. content = field_data['content'] KeyError at /admin/account/pokerroom/5/ 'content'
Seems that patch is already outdated. Where can I get a newer version?
Eugene
comment:201 by , 17 years ago
Cc: | added |
---|
comment:202 by , 17 years ago
BTW, it seems that problem was caused by insufficient space on /tmp partition I'm using for uploads. Here's additional info:
POST Variable Value _file_upload_error u'Failed to write to temporary file.'
It's a bug anyways.
by , 17 years ago
Attachment: | 6652_isValidImage_using_tmpfilename.diff added |
---|
made the isValidImage validator try the tempfile before reading content
comment:203 by , 17 years ago
Hi Oyvind,
Just wondering what problem your most recent patch fixes. I tried using newforms and the patch breaks with it. (See traceback below), Does your patch fix this problem? I've heard of some problems related to newforms. Is this what it is?
Thanks, Faheem.
*
from models import FileUpload, FolderUpload
from django.newforms import form_for_model
FileUploadForm = form_for_model(FileUpload)
FolderUploadForm = form_for_model(FolderUpload)
*
*
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/core/handlers/base.py" in _real_get_response
- response = callback(request, *callback_args, callback_kwargs)
File "/usr/local/lib/python2.4/site-packages/django/contrib/auth/decorators.py" in _checklogin
- return view_func(request, *args, kwargs)
File "/usr/local/lib/python2.4/site-packages/bixfile_dev/views.py" in file_upload
- success = form.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py" in save
- return save_instance(self, model(), fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py" in save_instance
- f.save_form_data(instance, cleaned_data[f.name])
File "/usr/local/lib/python2.4/site-packages/django/db/models/fields/init.py" in save_form_data
- getattr(instance, "save_%s_file" % self.name)(data.filename, data.content, save=False)
File "/usr/local/lib/python2.4/site-packages/django/db/models/fields/init.py" in
- setattr(cls, 'save_%s_file' % self.name, lambda instance, filename, raw_field, save=True: instance._save_FIELD_file(self,
filename, raw_field, save))
File "/usr/local/lib/python2.4/site-packages/django/db/models/base.py" in _save_FIELD_file
- if raw_field.has_key('tmpfilename'):
AttributeError at /proto_dev/bixfile_dev/
'str' object has no attribute 'has_key'
comment:204 by , 17 years ago
Just changed the oldforms image validator to use the tempfile. The problem you have was fixed in the 6603 patch.
comment:205 by , 17 years ago
Hi oyvind,
I'm getting some test breakage with your patch.
Ran 211 tests in 90.369s
FAILED (failures=3)
The most common error is:
AttributeError: 'UploadedFile' object has no attribute 'content'.
by , 17 years ago
Attachment: | 6654_fixed_tests_and_file_clean.diff added |
---|
diff did not apply cleanly, fixed
comment:207 by , 17 years ago
Django patched with the latest streaming patch can now be checked out using mercurial.
hg clone http://streaming.cylon.no/streaming django_src
comment:208 by , 17 years ago
Cc: | added |
---|
comment:211 by , 17 years ago
Cc: | removed |
---|
comment:212 by , 17 years ago
Cc: | added |
---|
comment:213 by , 17 years ago
Cc: | added |
---|
follow-up: 215 comment:214 by , 17 years ago
Could someone post updated directions to get the latest incarnation of this patch working? I'm using Oyvind's mercurial source, but I'm not sure if I still need to modify the middleware, change my models at all, add settings to settings.py, etc.
Also, is the progress meter still included in this patch, and what do I need to see it in the Admin?
There hasn't been much activity here lately -- is this thing close to making it into the trunk and getting some proper documentation?
comment:215 by , 17 years ago
Replying to Joshua Works:
Could someone post updated directions to get the latest incarnation of this patch working? I'm using Oyvind's mercurial source, but I'm not sure if I still need to modify the middleware, change my models at all, add settings to settings.py, etc.
It works out of the box for me, at least it does for the last version I used it for, which corresponds to Oyvinds last patch. However, apparently this patch does not apply cleanly to current trunk. Oyvind asked me to do a merge, but I've been busy with other things. I'll do my best to get to this in the next day or two.
Also, is the progress meter still included in this patch, and what do I need to see it in the Admin?
No, that is in another patch http://code.djangoproject.com/ticket/4165, and is buggy.
See my comments there. Apart from a backend problem with the Python code, the JS code has
problems too. It should be easy to rewrite for someone who knows JS, though. I may end up doing it myself if nobody else does, but I'll probably use a library like PrototypeJS or Jquery.
There hasn't been much activity here lately -- is this thing close to making it into the trunk and getting some proper documentation?
I think it will need a bunch of people to give it a push. I've been intending to try and coordinate something, but I've been busy with other things.
If you don't hear anything, feel free to ping me via private email. I'm quite keen that the patch continue working till it is merged into trunk. I need it for some stuff myself.
Faheem.
comment:216 by , 17 years ago
It absolutely doesn't need "a bunch of people to give it a push". We know this ticket is here and at some point one of the maintainers will sit down and review it again to see if all the problems from the last review have been fixed and no new ones introduced.
by , 17 years ago
Attachment: | streaming.6710.patch added |
---|
Updated streaming patch to trunk rev 6710.
comment:217 by , 17 years ago
Added streaming.6710.patch against trunk rev 6710. This passes all Django tests. I haven't tested it extensively, but it appears to work. Some quick notes on how I generated this patch in case anyone is interested. I used mercurial and kdiff3 for merging (OS Debian etch).
Note that the only file that required manual intervention was django/http/utils.py. This file could probably use a little cleanup.
Everything else was handled by kdiff3.
See Section 12.8 of the hgbook for a recipe for doing this http://hgbook.red-bean.com/hgbookch12.html#x16-28700012.8 and also this wiki page,
http://www.selenic.com/mercurial/wiki/index.cgi/MqMerge
Note that the method described in the wiki is currently less likely to give problems than the method in the hgbook (otherwise similar).
See this issue for further discussion, http://www.selenic.com/mercurial/bts/issue844
comment:218 by , 17 years ago
Successfully patched trunk rev 7018 with streaming.6710.patch. Works well.
What are the main issues blocking trunk integration? This functionality is of high importance to many, as one could argue that without this patch the current FileField and ImageField implementation is both buggy and exposes a security risk (denial of service by uploading large files as Apache is eating up memory and peaking CPU usage + large file upload plainly fails in the end without the patch).
Thus strong +1 for integration ASAP.
comment:219 by , 17 years ago
Cc: | added |
---|
comment:220 by , 17 years ago
I also urgently need that feature ... any chance to see this merged soon?
comment:221 by , 17 years ago
Please do not ask "when will this be merged" questions in the ticket tracker; all it does is generate noise. If you're genuinely interested in helping to get a specific patch merged, consider asking on the developers list about why it hasn't yet been committed or, better yet, searching the archive of that list to see if there's a reason given.
comment:222 by , 17 years ago
Cc: | added |
---|
comment:223 by , 17 years ago
streaming.7092.patch gives some errors. As usual, I don't understand
what is going on well enough to fix them. See below.
*
======================================================================
FAIL: Doctest: modeltests.model_forms.models.test.API_TESTS
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for modeltests.model_forms.models.test.API_TESTS
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line unknown line number, in API_TESTS
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
type(f.cleaned_datafile)
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[198]>", line 1, in ?
type(f.cleaned_datafile)
AttributeError: 'TextFileForm' object has no attribute 'cleaned_data'
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[199]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The TextFile could not be created because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.file
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[200]>", line 1, in ?
instance.file
NameError: name 'instance' is not defined
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f = TextFileForm(data={'description': u'Assistance'}, instance=instance)
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[201]>", line 1, in ?
f = TextFileForm(data={'description': u'Assistance'}, instance=instance)
NameError: name 'instance' is not defined
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.cleaned_datafile
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[203]>", line 1, in ?
f.cleaned_datafile
AttributeError: 'TextFileForm' object has no attribute 'cleaned_data'
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[204]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The TextFile could not be created because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.file
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[205]>", line 1, in ?
instance.file
NameError: name 'instance' is not defined
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
os.unlink(instance.get_file_filename())
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[206]>", line 1, in ?
os.unlink(instance.get_file_filename())
NameError: name 'instance' is not defined
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f = TextFileForm(data={'description': u'Assistance'}, files={'file': {'filename': 'test2.txt', 'content': 'hello world'}}, instance=instance)
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[207]>", line 1, in ?
f = TextFileForm(data={'description': u'Assistance'}, files={'file': {'filename': 'test2.txt', 'content': 'hello world'}}, instance=instance)
NameError: name 'instance' is not defined
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[209]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The TextFile could not be created because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.file
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[210]>", line 1, in ?
instance.file
NameError: name 'instance' is not defined
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.delete()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[211]>", line 1, in ?
instance.delete()
NameError: name 'instance' is not defined
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[219]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The TextFile could not be changed because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.file
Expected:
u'.../test3.txt'
Got:
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
type(f.cleaned_dataimage)
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[226]>", line 1, in ?
type(f.cleaned_dataimage)
AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[227]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The ImageFile could not be created because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.image
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[228]>", line 1, in ?
instance.image
AttributeError: 'TextFile' object has no attribute 'image'
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.cleaned_dataimage
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[231]>", line 1, in ?
f.cleaned_dataimage
AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[232]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The TextFile could not be created because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.image
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[233]>", line 1, in ?
instance.image
AttributeError: 'TextFile' object has no attribute 'image'
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
os.unlink(instance.get_image_filename())
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[234]>", line 1, in ?
os.unlink(instance.get_image_filename())
AttributeError: 'TextFile' object has no attribute 'get_image_filename'
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[237]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The TextFile could not be created because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.image
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[238]>", line 1, in ?
instance.image
AttributeError: 'TextFile' object has no attribute 'image'
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.delete()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[239]>", line 1, in ?
instance.delete()
File "/usr/local/lib/python2.4/site-packages/django/db/models/base.py", line 324, in delete
assert self._get_pk_val() is not None, "%s object can't be deleted because its %s attribute is set to None." % (self._meta.object_name, self._meta.pk.attname)
AssertionError: TextFile object can't be deleted because its id attribute is set to None.
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[247]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The ImageFile could not be changed because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.image
Expected:
u'.../test3.png'
Got:
======================================================================
FAIL: Doctest: regressiontests.forms.tests.test.fields_tests
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for regressiontests.forms.tests.test.fields_tests
File "/usr/local/src/django/django-trunk.patched/tests/regressiontests/forms/tests.py", line unknown line number, in fields_tests
File "/usr/local/src/django/django-trunk.patched/tests/regressiontests/forms/tests.py", line ?, in regressiontests.forms.tests.test.fields_tests
Failed example:
type(f.clean({'filename': 'name', 'content': 'Some File Content'}, 'files/test4.pdf'))
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest regressiontests.forms.tests.test.fields_tests[278]>", line 1, in ?
type(f.clean({'filename': 'name', 'content': 'Some File Content'}, 'files/test4.pdf'))
File "/usr/local/lib/python2.4/site-packages/django/newforms/fields.py", line 453, in clean
raise ValidationError(self.error_messagesempty)
ValidationError: [u'The submitted file is empty.']
Ran 234 tests in 63.274s
FAILED (failures=2)
comment:224 by , 17 years ago
http://code.djangoproject.com/browser/django/trunk/tests/modeltests/model_forms/models.py#L739
Add content-length it the data here.
by , 17 years ago
Attachment: | streaming.7092.patch.partial_tests_fix added |
---|
Slightly modified version of streaming.7092.patch with more tests passing.
comment:225 by , 17 years ago
With the patch I just uploaded (basically the changes are liberal use of 'content-length' as suggested by Oyvind, the errors are now reduced
to
======================================================================
FAIL: Doctest: modeltests.model_forms.models.test.API_TESTS
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for modeltests.model_forms.models.test.API_TESTS
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line unknown line number, in API_TESTS
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
type(f.cleaned_dataimage)
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[226]>", line 1, in ?
type(f.cleaned_dataimage)
AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[227]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The ImageFile could not be created because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.image
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[228]>", line 1, in ?
instance.image
AttributeError: 'TextFile' object has no attribute 'image'
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.cleaned_dataimage
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[231]>", line 1, in ?
f.cleaned_dataimage
AttributeError: 'ImageFileForm' object has no attribute 'cleaned_data'
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[232]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The TextFile could not be created because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.image
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[233]>", line 1, in ?
instance.image
AttributeError: 'TextFile' object has no attribute 'image'
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
os.unlink(instance.get_image_filename())
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[234]>", line 1, in ?
os.unlink(instance.get_image_filename())
AttributeError: 'TextFile' object has no attribute 'get_image_filename'
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
f.is_valid()
Expected:
True
Got:
False
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance = f.save()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[237]>", line 1, in ?
instance = f.save()
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 289, in save
return save_instance(self, self.instance, self._meta.fields, fail_message, commit)
File "/usr/local/lib/python2.4/site-packages/django/newforms/models.py", line 35, in save_instance
raise ValueError("The %s could not be %s because the data didn't"
ValueError: The TextFile could not be created because the data didn't validate.
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.image
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[238]>", line 1, in ?
instance.image
AttributeError: 'TextFile' object has no attribute 'image'
File "/usr/local/src/django/django-trunk.patched.new/tests/modeltests/model_forms/models.py", line ?, in modeltests.model_forms.models.test.API_TESTS
Failed example:
instance.delete()
Exception raised:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in run
compileflags, 1) in test.globs
File "<doctest modeltests.model_forms.models.test.API_TESTS[239]>", line 1, in ?
instance.delete()
File "/usr/local/lib/python2.4/site-packages/django/db/models/base.py", line 324, in delete
assert self._get_pk_val() is not None, "%s object can't be deleted because its %s attribute is set to None." % (self._meta.object_name, self._meta.pk.attname)
AssertionError: TextFile object can't be deleted because its id attribute is set to None.
Ran 234 tests in 63.014s
FAILED (failures=1)
comment:226 by , 17 years ago
content-length is in the wrong dict in ImageField tests and probably some others too.
>>> f = ImageFileForm(data={'description': u'An image'}, files={'image': {'filename': 'test.png', 'content': image_data}, 'content-length':len(image_data)})
should be
>>> f = ImageFileForm(data={'description': u'An image'}, files={'image': {'filename': 'test.png', 'content': image_data, 'content-length':len(image_data)}})
comment:227 by , 17 years ago
Cc: | added |
---|
comment:228 by , 17 years ago
Cc: | added |
---|
comment:229 by , 17 years ago
Cc: | added |
---|
comment:230 by , 17 years ago
I've been rewriting the multipart parsing to be lazier and found something that seems to me like a DDOS exploit in this code.
Consider line 231-233 in streaming.7106.patch. What happens here is:
- A multipart boundary has been seen recently, so we are in the HEADER state
- We are looking to match the end of of the header
- If there's not enough data we simply break and add more bytes to our unmatched bytes from the network and try matching again later
Is there any reason why a "bad" client could not send a multipart request that *never* properly sends the header termination bytes and thus forces the multipart parser to load as much of the stream as possible (up to Content-Length bytes) into memory? As the amount of bytes increases towards the maximum Content-Length the call to .find() itself will become incredibly expensive. A few malformed requests might also cause the server to swap into oblivion or run out of address space on 32 bit systems.
comment:231 by , 17 years ago
Bad wording in the prior comment.
First sentence should be read as follows:
This patch appears to introduce a DOS attack vector into Django.
comment:232 by , 17 years ago
I've tested this with:
- Lighttpd 1.4 with fastcgi/scgi: doesn't work. As somebody else said above it looks like it isn't streaming the upload at all, or maybe lighttpd waits for the POST to be over before streaming it to the fcgi app.
- Apache 2.2 with scgi: works fine so far.
comment:233 by , 17 years ago
Component: | Core framework → HTTP handling |
---|---|
Has patch: | unset |
Needs tests: | unset |
Patch needs improvement: | set |
I've tested streaming.7106.patch. It works for the upload file with 860MB. However I tested it with 1.5G. It fails. The message is below.
Request Method: POST Request URL: http://192.168.1.4/imagecity/upload/ Exception Type: MemoryError Exception Value: out of memory Exception Location: /usr/lib/python2.5/site-packages/django/http/multipartparser.py in _read, line 285 /usr/lib/python2.5/site-packages/django/http/multipartparser.py in _read 285. self._file.write(data[start:end]) ...
comment:234 by , 17 years ago
save_FOO_file() works wrong with streaming.7106.patch
If raises
Traceback (most recent call last): ... File "/usr/lib/python2.5/site-packages/django/db/models/fields/__init__.py", line 783, in <lambda> setattr(cls, 'save_%s_file' % self.name, lambda instance, filename, raw_field, save=True: instance._save_FIELD_file(self, filename, raw_field, save)) File "/usr/lib/python2.5/site-packages/django/db/models/base.py", line 414, in _save_FIELD_file if raw_field.has_key('tmpfilename'): AttributeError: 'str' object has no attribute 'has_key'
comment:235 by , 17 years ago
I worked on a lazier (i.e. you can read from the stream without loading the entire thing onto disk) and hopefully better patch for this, in case anyone is getting antsy. I haven't been able to return to it for a week or two, but all the annoying parts are pretty much done.
It does, however, gut the current implementation. At that time I also tested mod_wsgi2's buffering behavior and found that it does not do aggressive buffering (a good thing), so sites running behind mod_wsgi2 can do flow control on clients. This was done by augmenting the current patch with .sleep() calls between blocks and then posting a large file.
This is particularly important if you have to send the material to another server and would prefer (or require) that it not be held on disk in full, if the stream is of indeterminate length, or if you wanted to hold the request in full to begin with (for example: filtering a dataset as it goes by and populating the database with a structured representation)
comment:236 by , 17 years ago
Status: | new → assigned |
---|
Alright I'm going to be working on this ticket this week. Before I do any work, I'm uploading the current diff updated to work against HEAD largely untouched. After reviewing the comments, before we can consider this ready for checking or close to that, I think that the following is needed:
- Updating this to work with the new-ish FileField backends.
- Pull the MultiPartParser logic apart from the actual dealing with the chunked data. (I.e. a user should be able to do whatever they want with the data.)
- Make the MultiPartParser lazier (i.e. not look for boundaries, but let them come when available).
- Go through the bugs that people have reported (thanks for doing so!) and ensuring that they no longer exist.
-Mike
comment:237 by , 17 years ago
I think the general design also needs a review. Things like a get parameter that is acted upon before it gets to the view isn't really acceptable, since it means you can never use that parameter name in normal code. So can you please concentrate on a patch for solving the initial problem (large file uploads) and worry about stuff like upload progress meters after some design discussion on django-dev as a later thing. The last few patches seem to have been heading into feature-growth territory and not solving the legitimate problem that was the original bug report.
comment:238 by , 17 years ago
Has patch: | set |
---|---|
Needs documentation: | set |
Needs tests: | set |
Alright, this has been a long time waiting, but the ticket has been refactored to what I think has been a fairly reasonable state. It might undergo a few movements of code, but I think the overall design is here to stay. To get a grasp of the API this ticket enables, please visit the thread on this matter. [1] A great thanks goes to fdr--- he and I have been putting the *sprint* in this past Django sprint during the late hours (our hemisphere).
Please feel free to test this by running it on a platform and giving us the results!
An overview of the status of this ticket:
- It has basic functionality. It fails fast...we'll probably silence a couple of the errors before we go to trunk, but it's good to have noise while people are debugging.
- There is one test. The code for this test can be used to extend to more harsher tests (wrong content lengths, etc) to see how the parser handles.
- There are no doc changes yet. I'll have basic ones by the end of the week (how to change the default temporary file location, etc), and the docs for describing the API...I'm not sure :-)
- A few design things that could be resolved:
- Supporting multiple upload handlers (right now it supports 1 in a list that could be extended, but we haven't how it deals with ordering).
- Adding signals [doesn't have to be in this ticket, I think]
- We send the charset to the handler, but we could take care of encoding for it if the mime starts with 'text/'...people might complain about getting unicode objects (like a CSV for example).
- I separated the diffs to the one that deals with the uploading, and the one that deals with the saving of the uploaded data.
- The newforms interface is completely untested as of now.
by , 17 years ago
Attachment: | 2070_revision7339_uploadhandling.diff added |
---|
NEW Upload handling for revision 7339
by , 17 years ago
Attachment: | 2070_revision7339_formhandling.diff added |
---|
NEW Form handling for uploaded files for revision 7339
comment:239 by , 17 years ago
I'll probably be furnishing the enhanced docs before too long. One important debugging feature that isn't in yet is a way to detect if the parser starts to undergo an infinite loop, which can happen when LazyStream.read() and LazyStream.unget() are called in a way where the same bytes constantly get read out and pushed back in succession. It'd be much better to crash quickly and log well in this case so bugs can get reported.
by , 17 years ago
Attachment: | 2070_revision7351_latest.diff added |
---|
Latest patch for 2070...includes everything (see comment)
comment:240 by , 17 years ago
I just added a new diff, 2070_revision7351_latest.diff. Against my earlier ways, I've decided to just merge the two patch changes, since you really can't use one and not both. The new patch features the ability to work with multiple upload handlers, updating forms to work better, and a few bugfixes. The patch should now pass all tests.
What YOU can do to help:
- Download the patch.
- Run
patch -p0 < 2070_revision7351_latest.diff
on your Django installation. - Run tests (
django/tests/runtests.py --settings=...
) - Test with a project that involves uploading. (Big and small!)
- Your memory should stay below 100 MB regardless of how big your upload size is (TODO: measure how big the typical process size is and see if we have to call
gc.collect()
at some point.) - Your temporary directory should contain partial files during the upload is progressing (
ls -ltr /tmp
on *nix systems). - The uploading should work with all of your apps and forms like they have.
- Your memory should stay below 100 MB regardless of how big your upload size is (TODO: measure how big the typical process size is and see if we have to call
- Report ANY errors to this ticket.
Things left from this patch:
- Documentation of things other than
settings.FILE_UPLOAD_TEMP_DIR
. (I'm not sure it's a blocker for this ticket...we'd probably have to have another doc. section for this API.) - Some more mean tests could be written (probably have to circumvent the test client for a few).
- While the parser shouldn't enter an infinite loop, it can currently happen. There will be some accounting to make sure that this failure mode doesn't exist in a new patch.
- Review from the devs.
by , 17 years ago
Attachment: | 2070_latest7354.diff added |
---|
The latest #2070 patch. Small changes...see comment.
comment:241 by , 17 years ago
I just uploaded a new patch 2070_latest7354.diff
. This includes minor fixes including:
- Works/tested on python 2.3.
- Works/tested on windows.
- Changes how multiple handlers behave.
Later tonight I'm going to shoot an email to django-developers with a revised API, since the implementation has of course given me more insight into how things should work.
follow-up: 243 comment:242 by , 17 years ago
Mike, thanks for the new patch! Leaving aside design issues I have some comments on the implementation part. Feel free to disagree with any!
- You have many multi-line comments decorated with in boxes of #s. But the rest of the code doesn't do this, all comments are usually have just one # on the left of each line. And I personally prefer it in Django way, it's easier to read :-)
- In MultipartParser.init you seem to half got rid of guessing content_length if they're not in HTTP headers. What was the intent? Isn't headers the only place where it should be and thus you should raise and exception right away if it's not available?
- In MultipartParser.parse you have a common pattern of invoking a method on all handlers until it returns some non-false value. May be abstract it in a function like call_handlers(method_name, *args, kwargs) that will do the iteration?
- line 200 of multipartparser.py: if content-transfer-encoding is available but not 'base64' you're feeding non-decoded data to handlers. I believe it's only going to work if encoding is '8bit' and in other cases it's better to raise an exception that encoding is unknown.
- I couldn't figure out why FakePayload is needed. Can you clarify?
- In HttpRequest you're taking special care of not allowing setting _upload_handlers in _set_upload_handlers but they can still be changed since you're giving out the whole list and it's mutable. I.e. request.upload_handlers.append(handler) will work regardless. I realize now that you set it to a tuple but it can be overwritten. May be it's better to just have request.add_upload_handler()? What's the point of reading them?
- In HttpRequest.parse_file_upload you have a comment "Order here is *very* important." but never say how exactly it works. May be a short note like: "Handlers that go sooner may cancel remaining handlers by signaling that handling is complete".
- line 420 of db.models.base: I think a comment "exclusive lock" before "filelocks.lock(fp, filelocks.LOCK_EX)" just asks to have this call to be wrapped up in a function "exclusive_lock(fp)" so you no longer have to explain it. Looks like it's the only lock you're using anyway.
I didn't look through uploadedfile.py and fileuploadhandler.py, will leave it till later.
follow-up: 244 comment:243 by , 17 years ago
Replying to Maniac <Maniac@SoftwareManiacs.Org>:
Mike, thanks for the new patch! Leaving aside design issues I have some comments on the implementation part. Feel free to disagree with any!
Thank you for your comments! Before I begin, what issues do you have with the design?
Here's a response to each of your comments:
- You have many multi-line comments decorated with in boxes of #s. But the rest of the code doesn't do this, all comments are usually have just one # on the left of each line. And I personally prefer it in Django way, it's easier to read :-)
This is style. I will update this in a future patch.
- In MultipartParser.init you seem to half got rid of guessing content_length if they're not in HTTP headers. What was the intent? Isn't headers the only place where it should be and thus you should raise and exception right away if it's not available?
I'm not entirely sure what you mean by this. The current patch will not guess the content-length. Unfortunately, there's no way that I can see around this, since the content length is really necessary. Unfortunately raising an exception will result in a "Connection Reset" error, but I guess it's better than any alternative I can think of.
- In MultipartParser.parse you have a common pattern of invoking a method on all handlers until it returns some non-false value. May be abstract it in a function like call_handlers(method_name, *args, kwargs) that will do the iteration?
If you pay close attention, you will see that it's not really common. (Especially in the latest patch.) I think the complexity involved to encapsulate each of these loops would just do more to obfuscate the code. You're welcome to prove me wrong, though. :)
- line 200 of multipartparser.py: if content-transfer-encoding is available but not 'base64' you're feeding non-decoded data to handlers. I believe it's only going to work if encoding is '8bit' and in other cases it's better to raise an exception that encoding is unknown.
No. My interpretation of the RFCs is that we can accept any one of '8bit', '7bit', or 'binary' without doing anything. These 3 encodings are really telling us how we can interpret them (7bit being just ASCII, 8bit and binary being 8bit character sets and binary files) and don't actually need any decoding before being written to a file. My perusal of the email.message
python module seems to agree with this assessment, and it also shows how to support a few other encodings I might wish to add support for.
- I couldn't figure out why FakePayload is needed. Can you clarify?
I thought the docstring said it. It restricts what can be acted on in a test client request. The point is that (a) a StringIO object supports too many features and (b) we should complain when we try to read more than the available content, since this will result in poor performance when the file is a wrapper around a blocking network socket. Again, these are only for tests, as I feel just using StringIO doesn't nearly exhibit the same behavior as the network socket.
- In HttpRequest you're taking special care of not allowing setting _upload_handlers in _set_upload_handlers but they can still be changed since you're giving out the whole list and it's mutable. I.e. request.upload_handlers.append(handler) will work regardless. I realize now that you set it to a tuple but it can be overwritten. May be it's better to just have request.add_upload_handler()? What's the point of reading them?
It can only override it by using the "non-public" API of _upload_handlers
. By using the public API, you're ensuring yourself the safety of knowing that the upload handlers are being modified with expected behavior. I thought about providing a .add_upload_handler()
but then I wanted a .prepend_upload_handler()
and I wanted a .insert_upload_handler()
and realized that we really have a list, and the ways to modify a list is well documented. There's no reason why we can't expect someone to want to read the upload handlers to decide where he wants to place something. (E.g.: My fictional AJAXUploadHandler Middleware will always place the AJAXUploadHandler after the InMemoryUploadHandler.)
- In HttpRequest.parse_file_upload you have a comment "Order here is *very* important." but never say how exactly it works. May be a short note like: "Handlers that go sooner may cancel remaining handlers by signaling that handling is complete".
Good point. I'll have this in a future patch.
- line 420 of db.models.base: I think a comment "exclusive lock" before "filelocks.lock(fp, filelocks.LOCK_EX)" just asks to have this call to be wrapped up in a function "exclusive_lock(fp)" so you no longer have to explain it. Looks like it's the only lock you're using anyway.
This is really a stylistic thing. The thing is there are a lot of ways to lock a file, so I just want to use a clean syntax that supports all those ways. I don't like the idea of adding a function wrapper for each action when a comment can make it just as clear. (And really, filelocks.lock(fp, filelocks.LOCK_EX)
is probably clear enough for anyone who knows what an exclusive lock is to not even need that comment.)
I didn't look through uploadedfile.py and fileuploadhandler.py, will leave it till later.
Thanks again for your comments. Look forward to hearing more from you!
follow-up: 245 comment:244 by , 17 years ago
Thank you for your comments! Before I begin, what issues do you have with the design?
None, as of now :-)
- In MultipartParser.init you seem to half got rid of guessing content_length if they're not in HTTP headers. What was the intent? Isn't headers the only place where it should be and thus you should raise and exception right away if it's not available?
I'm not entirely sure what you mean by this. The current patch will not guess the content-length. Unfortunately, there's no way that I can see around this, since the content length is really necessary. Unfortunately raising an exception will result in a "Connection Reset" error, but I guess it's better than any alternative I can think of.
By "half got rid" I mean this code in init:
try: content_length = int(META.get('HTTP_CONTENT_LENGTH', META.get('CONTENT_LENGTH',0))) except (ValueError, TypeError): # For now set it to 0...we'll try again later on down. content_length = 0 # If we have better knowledge of how much # data is remaining in the request stream, # we should use that. (modpython for instance) #try: # remaining = input_data.remaining # if remaining is not None and \ # (content_length is None or remaining < content_length): # content_length = remaining #except AttributeError: # pass if not content_length: # This means we shouldn't continue...raise an error. raise MultiPartParserError("Invalid content length: %r" % content_length)
I think everything after first "except" can be safely deleted and in first "except" should be "raise MultiPartParserError(...)" instead of setting content_length to 0.
If you pay close attention, you will see that it's not really common. (Especially in the latest patch.) I think the complexity involved to encapsulate each of these loops would just do more to obfuscate the code. You're welcome to prove me wrong, though. :)
Oh... Now I see that read_data_chunk makes things differently requiring that non-None value to continue. It's not worth abstracting then.
A small nit... What actually made me think about this is that you have "handlers = self._upload_handlers" at the top of "parse" and I tried to figure out if it's really needed to shave 14 letters in a couple of cases at the expense of requiring a reader to make one more translations "handlers are those same handlers that lay in an object".
No. My interpretation of the RFCs is that we can accept any one of '8bit', '7bit', or 'binary' without doing anything. These 3 encodings are really telling us how we can interpret them (7bit being just ASCII, 8bit and binary being 8bit character sets and binary files) and don't actually need any decoding before being written to a file. My perusal of the
email.message
python module seems to agree with this assessment, and it also shows how to support a few other encodings I might wish to add support for.
What I'm worried about is 'quoted-printable'. Now it will go to the handler unchanged which is wrong. Updating my comment to your reply I think the logic should be like this:
if transfer_encoding == 'base64': # decode elif transfer_encoding in ['8bit', '7bit', 'binary']: # just feed else: raise ...
BTW, browsers don't use quoted-printable here but some hand-written user-agent could...
- I couldn't figure out why FakePayload is needed. Can you clarify?
I thought the docstring said it. It restricts what can be acted on in a test client request. The point is that (a) a StringIO object supports too many features and (b) we should complain when we try to read more than the available content, since this will result in poor performance when the file is a wrapper around a blocking network socket. Again, these are only for tests, as I feel just using StringIO doesn't nearly exhibit the same behavior as the network socket.
I did understand what it does but didn't understand why it's needed. Using it for tests makes sense now, thanks!
It can only override it by using the "non-public" API of
_upload_handlers
. By using the public API, you're ensuring yourself the safety of knowing that the upload handlers are being modified with expected behavior. I thought about providing a.add_upload_handler()
but then I wanted a.prepend_upload_handler()
and I wanted a.insert_upload_handler()
and realized that we really have a list, and the ways to modify a list is well documented. There's no reason why we can't expect someone to want to read the upload handlers to decide where he wants to place something. (E.g.: My fictional AJAXUploadHandler Middleware will always place the AJAXUploadHandler after the InMemoryUploadHandler.)
But each custom handler will know where to put itself only relative to standard handlers that it knows about... Middleware machinery just leaves this to a user to figure out. In this case I believe everyone can also place middleware, decorators etc in the right order without parsing this list in code.
(And really,
filelocks.lock(fp, filelocks.LOCK_EX)
is probably clear enough for anyone who knows what an exclusive lock is to not even need that comment.)
Indeed :-). That was my second option too :-)
comment:245 by , 17 years ago
Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:
Alright, we're very close here!
- By "half got rid" I mean this code in init ...
Yeah, actually the new patch that I have queued up changes this behavior a little bit. It's more in line with this thinking.
- A small nit... What actually made me think about this is that you have "handlers = self._upload_handlers" at the top of "parse" and I tried to figure out if it's really needed to shave 14 letters in a couple of cases at the expense of requiring a reader to make one more translations "handlers are those same handlers that lay in an object".
I actually didn't do it to save 14 characters. I did it because this loop can potentially run millions of iterations and I didn't want to do a needless self.var
lookup when var
is slightly faster. By this point the loop is so complicated that I might as well change it, but still...CPU is CPU.
- What I'm worried about is 'quoted-printable'. Now it will go to the handler unchanged which is wrong. Updating my comment to your reply I think the logic should be like this:
My hope was that someone could write a handler to deal with quoted-printable :).
- But each custom handler will know where to put itself only relative to standard handlers that it knows about... Middleware machinery just leaves this to a user to figure out. In this case I believe everyone can also place middleware, decorators etc in the right order without parsing this list in code.
Yeah I understand the issue. I'm not sure of the best interface. My thinking was that a list is so well-known that people won't be surprised to learn to do: request.upload_handlers.append(upload_handler)
. I'm open to suggestions that will support the different operations.
Thanks again for your review. I'm writing a long-ish email to Django developers outlining a few of the issues and a revised API description.
comment:246 by , 17 years ago
Needs documentation: | unset |
---|
Okay, I'm posting a new diff. Not many changes. A few stylistic changes, a better handling of multiple handlers, and the new documentation for the upload handlers.
by , 17 years ago
Attachment: | 2070_revision7359.diff added |
---|
New diff...some style changes and new documentation.
follow-up: 248 comment:247 by , 17 years ago
Mike, here are comments & question about fileuploadhandler.py and uploadedfile.py. Apparently, only one question :-)
MemoryFileUploadHandler implements handle_raw_input and, thus, has to duplicate the logic of the multipart parser. Why not implement this handler just like TemporaryFileUploadHandler but using StringIO?
comment:248 by , 17 years ago
Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:
Mike, here are comments & question about fileuploadhandler.py and uploadedfile.py. Apparently, only one question :-)
MemoryFileUploadHandler implements handle_raw_input and, thus, has to duplicate the logic of the multipart parser. Why not implement this handler just like TemporaryFileUploadHandler but using StringIO?
Yes, I originally used StringIO and did it the same way, but I realized that since it's all in memory, I can use the proven email parser. I thought it would be really slick if it's so flexible we can even revert back to our old ways of parsing for when we know we can put it all in memory.
That being said, we can change it back if you think it's unnecessary.
follow-up: 250 comment:249 by , 17 years ago
Yes, I think it is unnecessary. I believe we're going to debug the streaming parser to reliable state anyway, aren't we? :-) And after this we'll just have two different parsers for no purpose...
follow-up: 251 comment:250 by , 17 years ago
Replying to Ivan Sagalaev <Maniac@SoftwareManiacs.Org>:
Yes, I think it is unnecessary. I believe we're going to debug the streaming parser to reliable state anyway, aren't we? :-) And after this we'll just have two different parsers for no purpose...
Okay. That being said, let me outline what my new patch will have, before I finish it:
- A settings variable for when to use memory and when to use streaming (
settings.FILE_UPLOAD_MAX_MEMORY_SIZE
). - A settings variable for the default upload handlers. (
settings.FILE_UPLOAD_HANDLERS
) - One single parser.
- Forms support for dictionary (but emitting deprecation warnings).
- Updating forms docs to use the new File notation instead of a dictionary.
Let me know if there's anything to add to this list!
follow-up: 252 comment:251 by , 17 years ago
- A settings variable for when to use memory and when to use streaming (
settings.FILE_UPLOAD_MAX_MEMORY_SIZE
).
BTW when I was thinking about such thing in the past I had an idea that we can use the size of the buffer that we use to read data from input_stream to be this maximum value. I.e. if we can afford some amount of memory per file then there's no point of making buffer size smaller.
- A settings variable for the default upload handlers. (
settings.FILE_UPLOAD_HANDLERS
)- One single parser.
- Forms support for dictionary (but emitting deprecation warnings).
- Updating forms docs to use the new File notation instead of a dictionary.
This is great! Looking forward to it.
follow-up: 253 comment:252 by , 17 years ago
BTW when I was thinking about such thing in the past I had an idea that we can use the size of the buffer that we use to read data from input_stream to be this maximum value. I.e. if we can afford some amount of memory per file then there's no point of making buffer size smaller.
I had a similar idea, however this is entirely dependent on the garbage collection, no? That is, if we called gc.collect()
on every iteration of the chunks, I think we'd be set.
-Mike
follow-up: 254 comment:253 by , 17 years ago
I had a similar idea, however this is entirely dependent on the garbage collection, no? That is, if we called
gc.collect()
on every iteration of the chunks, I think we'd be set.
Uhm... Looks like I wasn't clear enough, sorry... I just mean that if we have a setting FILE_UPLOAD_MAX_MEMORY_SIZE then we can use this value for chunk_size in FileUploadHandler. It has nothing to do with GC.
comment:254 by , 17 years ago
Replying to Maniac <Maniac@SoftwareManiacs.Org>:
Uhm... Looks like I wasn't clear enough, sorry... I just mean that if we have a setting FILE_UPLOAD_MAX_MEMORY_SIZE then we can use this value for chunk_size in FileUploadHandler. It has nothing to do with GC.
I don't think so. These are two different parameters. As an example, suppose I set FILE_UPLOAD_MAX_MEMORY_SIZE to 2.5 MB. If this were the chunk size, then that would mean each iteration would create a value which held 2.5MB worth of data. If the GC always collected OR if the GC collected based on *size*, then it would mean that the upload shouldn't take appreciably more than 2.5MB total. However, if the GC collects based on number of allocations (which I believe is the default in standard CPython), then having a chunk size of 2.5MB would take appreciably more memory than having a lower chunk size; and certainly much more memory than an entire upload going into memory at once.
At least, this is my understanding of the garbage collection in python. If you look at your memory usage during an upload, you will see interesting memory usage.
comment:255 by , 17 years ago
Ah... Now I understand the relation with GC. But you seem to have explored this question much more than I even tried so I refrain with my idea.
comment:256 by , 17 years ago
Okay! This patch includes all of the aforementioned improvements. It still passes all the tests. I've tested uploads on the following platforms:
- Python 2.5/2.3
- Apache mod_python, mod_fastcgi, mod_wsgi
- Lighttpd mod_fastcgi
- Development server
They all stream out of the box.
comment:257 by , 17 years ago
Needs tests: | unset |
---|---|
Patch needs improvement: | unset |
I like the patch as is. I think now is a good time to start testing and seeing if this is what people want.
by , 17 years ago
Attachment: | 2070_revision7363.diff added |
---|
Altered documentation to be more approachable.
by , 17 years ago
Attachment: | 2070_revision7363.2.diff added |
---|
Slightly newer...handles exhaustion a little bit differently.
follow-up: 259 comment:258 by , 17 years ago
There's a problem with the image validation in django.newforms.fields.ImageField (line 507)
If file is a StringIO object, the second call to Image.open will fail with an IOError unless there's a file.reset() call first, something like:
try: # load() is the only method that can spot a truncated JPEG, # but it cannot be called sanely after verify() trial_image = Image.open(file) trial_image.load() # verify() is the only method that can spot a corrupt PNG, # but it must be called immediately after the constructor if hasattr(file, 'reset'): file.reset() trial_image = Image.open(file) trial_image.verify() except Exception: # Python Imaging Library doesn't recognize it as an image raise ValidationError(self.error_messages['invalid_image'])
follow-ups: 260 261 comment:259 by , 17 years ago
Replying to Eric B <ebartels@gmail.com>:
There's a problem with the image validation in django.newforms.fields.ImageField (line 507)
[...]
Hi Eric, thanks so much for the information. I just have a comment and a question.
Question: If this is a problem, how did this patch change anything? The current implementation just does StringIO(f.content)
. I was intending to change as little functionality as possible, and it passes the tests (which I believe include image validity checking in newforms)...
Comment: There's no reason to have the hasattr()}} check there, because there's an if statement 5 lines above that will know when its a {{{StringIO
object and when it isn't.
Again, thanks for your feedback!
comment:260 by , 17 years ago
Replying to axiak:
Replying to Eric B <ebartels@gmail.com>:
There's a problem with the image validation in django.newforms.fields.ImageField (line 507)
[...]
Hi Eric, thanks so much for the information. I just have a comment and a question.
Question: If this is a problem, how did this patch change anything? The current implementation just does
StringIO(f.content)
. I was intending to change as little functionality as possible, and it passes the tests (which I believe include image validity checking in newforms)...
Doh. I see it now. Sorry about that. :-)
I have a few other changes queued up, so I'll include that in the next patch.
comment:261 by , 17 years ago
Replying to axiak:
and it passes the tests (which I believe include image validity checking in newforms)...
As it turns out, the current tests have a model called ImageFile
which actually doesn't test the image field. Is there any reason we cannot do this for our tests?
class ImageFile(models.Model): description = models.CharField(max_length=20) try: # If PIL is available, try testing PIL. # Otherwise, it's equivalent to TextFile above. import Image image = models.ImageField(upload_to=tempfile.gettempdir()) except ImportError: image = models.FileField(upload_to=tempfile.gettempdir()) def __unicode__(self): return self.description
After doing that, the test fails properly without Eric's suggested modification. (When PIL is installed.)
Cheers,
Mike
comment:262 by , 17 years ago
Cc: | added |
---|
by , 17 years ago
Attachment: | 2070_revision7377.diff added |
---|
New 2070 patch...includes fixes to the parser in addition to the newforms fix caught by EricB as well as better docs.
comment:263 by , 17 years ago
The new patch includes a couple improvements over the last. No change in design, just small buxfixes:
- The low-level API parser now correctly keeps track of where it is (LazyStream). (The default django api never used this feature.)
- The newforms problem with Image validation is fixed, along with modifying the tests to test that.
- The docs were reworded a little bit.
If it makes reviewing any easier, I'm thinking of coming up with a patch series that will represent a good flow of changes to follow...that's next on my list. If you are thinking about reviewing, here's the path I'd take:
- Read the high-level docs. http://orodruin.axiak.net/upload_handling.html Make sure you're fine with the design.
- Read the diff of stuff in /tests. Make sure you're fine with the test modifications.
- At this point it's probably best to start with the HTTP request and work your way in.
Thanks to everyone who's trying this patch out!
Cheers,
Mike
follow-up: 265 comment:264 by , 17 years ago
This patch correct the bug for the ImageField. The problem come from StringIO(data.read()) that's become empty after the 1st read. By using a var to store str from data.read(), you can call 2 different StringIO on the constructor.
--- fields.py 2008-03-31 15:25:36.000000000 -0700 +++ fields.py 2008-03-31 17:04:23.000000000 -0700 @@ -494,17 +494,21 @@ except ImportError: from StringIO import StringIO if hasattr(data, 'read'): - file = StringIO(data.read()) + data_buffer = data.read() + file_str_io_1 = StringIO(data_buffer) + file_str_io_2 = StringIO(data_buffer) else: - file = StringIO(data['content']) + data_buffer = data['content'] + file_str_io_1 = StringIO(data_buffer) + file_str_io_2 = StringIO(data_buffer) try: # load() is the only method that can spot a truncated JPEG, # but it cannot be called sanely after verify() - trial_image = Image.open(file) + trial_image = Image.open(file_str_io_1) trial_image.load() # verify() is the only method that can spot a corrupt PNG, # but it must be called immediately after the constructor - trial_image = Image.open(file) + trial_image = Image.open(file_str_io_2) trial_image.verify() except Exception: # Python Imaging Library doesn't recognize it as an image raise ValidationError(self.error_messages['invalid_image'])
follow-up: 266 comment:265 by , 17 years ago
Replying to bbrriiccee@gmail.com:
This patch correct the bug for the ImageField. The problem come from StringIO(data.read()) that's become empty after the 1st read. By using a var to store str from data.read(), you can call 2 different StringIO on the constructor.
...
Huh. I thought I had included the fix in the latest patch...I'll review and commit the patch with the latest code.
By the way, your patch has two inefficiencies:
- You shouldn't have to call .read() on the file object if it has a temporary file name. If it's a large file, we should let the OS do the seeking which is what happens when you give the filename to PIL.
- There's no reason to have two distinct StringIO objects when you can just call .reset().
I'll have the fixed patch up in a minute or two.
by , 17 years ago
Attachment: | 2070_revision7396.diff added |
---|
Better patch...not sure how the newforms code in the previous patch was the way it was.
comment:266 by , 17 years ago
Replying to axiak:
[...]
I'll have the fixed patch up in a minute or two.
Patch uploaded. Again, I'm not certain how the newforms changes didn't get in. But I do know that I was really tired when I uploaded the last version. I gave this new patch a once-over at least...
comment:267 by , 17 years ago
Mike, I beleive the last patch introduced an error in fileuploadhandler.load_handler. We're now passing request to load_handler as a separate parameter and wrapping it with RestrictedRequest. On line 223 of fileuploadhandler, request needs to be passed separately to the FileUploadHandler constructor also. Otherwise, the handler won't have access to it.
return cls(*args, **kwargs)
needs to be:
return cls(request, *args, **kwargs)
With that change, everything seems to be working perfectly. Thanks.
follow-up: 269 comment:268 by , 17 years ago
Hey Eric,
Thanks for keeping me honest :-). I will upload the patch as soon as I finish this comment.
Another question: Are you doing anything interesting with this patch? Just curious, since you wouldn't have hit this bug with either of the built-in upload handlers.
Cheers,
Mike
by , 17 years ago
Attachment: | 2070_revision7388.diff added |
---|
new patch that fixes error with instantiating the upload handlers, thanks Eric!
follow-up: 270 comment:269 by , 17 years ago
Replying to axiak:
Another question: Are you doing anything interesting with this patch? Just curious, since you wouldn't have hit this bug with either of the built-in upload handlers.
Since this patch makes it so easy, I decided to throw together an upload handler to display progress feedback. I know you had a patch in #4165, so you've probably got something similar in the works. I posted what I came up with to djangosnippets: http://www.djangosnippets.org/snippets/678/
comment:270 by , 17 years ago
Replying to Eric B <ebartels@gmail.com>:
Since this patch makes it so easy, I decided to throw together an upload handler to display progress feedback. I know you had a patch in #4165, so you've probably got something similar in the works. I posted what I came up with to djangosnippets: http://www.djangosnippets.org/snippets/678/
Awesome job. Do you want to contribute to django-uploadutils? I've been to busy with other things at the moment so haven't had time to write stuff like what you just wrote. It's very easy for me to add you as a contributor...
comment:271 by , 17 years ago
My feedback on the patch (2070_revision7388.diff):
- Looks like some unrelated changes might have creapt in; this shouldn't be in the patch:
-
django/conf/global_settings.py
a b 114 114 SEND_BROKEN_LINK_EMAILS = False 115 115 116 116 # Database connection info. 117 DATABASE_ENGINE = '' # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or ' oracle'.117 DATABASE_ENGINE = '' # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'ado_mssql'. 118 118 DATABASE_NAME = '' # Or path to database file if using sqlite3. 119 119 DATABASE_USER = '' # Not used with sqlite3. 120 120 DATABASE_PASSWORD = '' # Not used with sqlite3.
- I have reservations about
FILE_UPLOAD_HANDLERS
; let's discuss this in the thread on django-developers.
- Please provide a link to the Python Cookbook recipe from which you took the file lock implementation. You also need to note that the file lock code is licensed under the Python Software License (as is all the cookbook code).
- Please double-check your code for PEP8/Django coding style. I see a number of places where you differ in style -- espcially in docstring usage, a few classes named things like
FILE
,FIELD
, etc., and exception raising (useraise E(msg)
istead ofraise E, msg
).
- Naming nitpick:
django/core/files/fileuploadhandler.py
-- "file" is redundant in there; you can name it (and other bits) things likedjango/core/files/uploadhandler.py
.
- In general you've gone with the Java-ish style of one class per file. That's OK, and in this case I think it works relatively well. However, it can make flow hard to follow -- if a class/function is only used in a single place, you really should define it in the same place it's used. No "action item" here, just a note.
- Can you explain why you think you need
RestrictedRequest
? Can't we just insist that people write upload handlers correctly?
- Can you explain the
save_FIELD_file
/move_FIELD_file
distinction to me? I don't see why you need both methods. While you're at it, can you explain the decision to change the method interface? That'll be a backwards-incompatible change, so let's not change it unless we have to.
- Nice job on the RFC2388 parser -- very clearly written and easy to read.
- Another style point: warnings code should look like this:
if something_is_wrong: import warnings warnings.warn(...)
(That is, instead of importing warnings at the module level). IIRC there's a minor overhead associated with initializing the warnings system the first time warnings is imported; our "house style" is thus to put the import in the guarded block. It also makes the future replacement of the warning with an error an easier change that doesn't leave behind an undeeded import.
- Similarly, in deprecating dict usage for uploaded files, keep all the backwards-compat code in the if clause. You've got this
-
django/newforms/fields.py
a b 444 445 return None 445 446 elif not data and initial: 446 447 return initial 448 449 if isinstance(data, dict): 450 # We warn once, then support both ways below. 451 warnings.warn("The dictionary usage for files is deprecated. Use the new object interface instead.", DeprecationWarning) 452 447 453 try: 448 f = UploadedFile(data['filename'], data['content']) 449 except TypeError: 454 file_name = data.file_name 455 file_size = data.file_size 456 except AttributeError: 457 try: 458 file_name = data.get('filename') 459 file_size = bool(data['content']) 460 except (AttributeError, KeyError): 461 raise ValidationError(self.error_messages['invalid']) 462 463 if not file_name:
You shouldn't have both the if and the try/except; the if clause should convert the data into an uploaded file. Again, this makes removing the legacy support easier in the future.
- Another patch mixup: you've added an
unescape_entities
todjango.utils.text
. That's not part of this patch, is it?
- Again:
-
docs/db-api.txt
a b 1306 1306 Using raw strings (e.g., ``r'foo'`` instead of ``'foo'``) for passing in the 1307 1307 regular expression syntax is recommended. 1308 1308 1309 Regular expression matching is not supported on the ``ado_mssql`` backend. 1310 It will raise a ``NotImplementedError`` at runtime. 1311
Not part of file uploads.
- Don't comment out code; just remove it:
-
tests/regressiontests/bug639/tests.py
a b 9 9 from regressiontests.bug639.models import Photo 10 10 from django.http import QueryDict 11 11 from django.utils.datastructures import MultiValueDict 12 from django.core.files.uploadedfile import SimpleUploadedFile 12 13 13 14 class Bug639Test(unittest.TestCase): 14 15 … … 21 22 22 23 # Fake a request query dict with the file 23 24 qd = QueryDict("title=Testing&image=", mutable=True) 24 qd["image_file"] = { 25 "filename" : "test.jpg", 26 "content-type" : "image/jpeg", 27 "content" : img 28 } 29 25 #qd["image_file"] = { 26 # "filename" : "test.jpg", 27 # "content-type" : "image/jpeg", 28 # "content" : img 29 # } 30 qd["image_file"] = SimpleUploadedFile('test.jpg', img, 'image/jpeg') 31 30 32 manip = Photo.AddManipulator() 31 33 manip.do_html2python(qd) 32 34 p = manip.save(qd)
Overall, very well done!
comment:272 by , 17 years ago
w.r.t. 4:
FILE, FIELD, et al are intended to just act as symbols, and thus I named them similarly to constants. Would you prefer a more regular-class names like File, Field, etc?
My rationale for the naming was:
- Their function is similar to that of a constant
- It looks more informative in a stack trace than "0" or "1" as you would see with something like "FILE = 0" declared in the module
- It's less ambiguous than a string
All in all: I can't believe you actually went through that monster patch in such fine detail. I thought that serious reviews would happen until axiak or I got around to breaking it up.
comment:273 by , 17 years ago
Hello,
after parsing the head of a file that is bieng uploaded (server-side parsing during the first receive_data_chunk), i would like to abort the transfert if the file is not acceptable. To do so i tried to raise a SkipFile or a StopUpload but i has no effect until the file is fully uploaded. Did i misunderstood the goal of these exceptions, or do i use it the wrong way ? Is it possible to something similar and retrieve a message from my custom FileUploadHandler in the view ?
Anyway, this patch is very nice and flexible, thank you for this :-)
Camille.
follow-up: 275 comment:274 by , 17 years ago
SkipFile will continue to eat the remainder of stream, although it won't deliver it to you for handling. StopFile should stop the transfer cold and cause the client to see a connection reset error. Actually, I'm surprised that StopFile isn't causing some other issues...
--- a/django/http/multipartparser.py +++ b/django/http/multipartparser.py @@ -230,6 +230,8 @@ class MultiPartParser(object): exhaust(stream) elif isinstance(e, StopUpload): # Abort the parsing and break + # This doesn't look right, where do we define abort? + # shouldn't this crash? parser.abort() break else:
I'll have a more thorough look before too long, but I have some stuff going on. This chunk of the code is in Mike's expertise, so he will probably chime in first. It looks like some broken refactoring or something here....my quick grep of revision logs doesn't seem to suggest there ever was anything like a "def abort"....actually, mentions of "abort" are scant in general, with the exception of this line. Seems like something got forgotten...maybe it's time for a regression test.
Thanks for the report...I think we can have this fixed quickly, looks like an oversight that will require no particular design change.
comment:275 by , 17 years ago
Replying to fdr:
SkipFile
should eat the remainder of the stream until the next file, but StopUpload
(as intended) should eat the stream until the end. The reasoning for this is that a connection reset error wouldn't allow the custom UploadHandler
to specify a custom error message, and the end user would be without a doubt completely confused. I think perhaps we should give a keyword argument so a person can do either:
raise StopUpload()
or
raise StopUpload(connection_reset=True)
So that the user understands the consequences of aborting an upload.
parser.abort()
Yeah... abort
was in some initial API that never really existed nor was documented. I was going to change it but must have forgotten about it. I would've caught it had I actually done work on django-uploadutils. Time for me to get over there...
follow-up: 278 comment:276 by , 17 years ago
Hi all,
my temporary files are correctly garbage collected in wsgi development mode, but not when running mod_python with apache2. After sutting down apache the files remain on the harddrive. Did anybody experienced that?
Camille.
comment:277 by , 17 years ago
Cc: | added |
---|
follow-up: 279 comment:278 by , 17 years ago
Replying to Camille Harang:
my temporary files are correctly garbage collected in wsgi development mode, but not when running mod_python with apache2. After sutting down apache the files remain on the harddrive. Did anybody experienced that?
Hey Camille. Thanks for the report! I'll check on this after I update the patch per Jacob's comments. The only way I can see this happening is if apache is somehow preventing it from garbage collecting. However, this is why we store temporary files in /tmp
-- the OS can (and should) wipe out that directory periodically. :-) There's no guarantee that if a process gets a KILL signal or crashes hard that the temporary data will not be left on the disk -- I don't think we can make this guarantee. (Again though, I will look into this.)
Cheers,
Mike
comment:279 by , 17 years ago
There's no guarantee that if a process gets a KILL signal or crashes hard that the temporary data will not be left on the disk -- I don't think we can make this guarantee.
From my experience it really cannot be done since when Apache restarts it kills it's processes hard and they don't have a chance to cleanup. So leaving sometime tmp files is a necessary evil.
comment:280 by , 17 years ago
Cc: | added |
---|
comment:281 by , 17 years ago
Hi all,
yes i dreaded that there must be temporary file problems with apache, unfortunately i use a separate tmp directory because i some cases i need to keep the files so i want to avoid cross-filesystem copying when renaming it. I use a cron job to delete unused files (checking if it is accessed with fuser), if somebody is intersted in just ask me. BTW i added a 'keep' parameter to my TemporaryFile subclass, setting it to True at any time prevents the file to be deleted after closing, then just use os.rename() to store it smoothely elsewhere. I don't know if it is worthed to add this feature in the main code, anyway here it is bellow. I also experienced problems when checking files content in file_complete() because the file remains unclosed, i added self.file.flush() before self.file.seek(0) in my TemporaryFileUploadHandler subclass, once again i don't know if it needs to be included in the main code, it's a suggestion.
Camille.
class MaybeTemporaryFile(TemporaryFile): keep = False def __del__(self): if not self.keep: try: os.unlink(self.name) except OSError: pass
comment:282 by , 17 years ago
Cc: | added |
---|
comment:283 by , 17 years ago
Mike, can you clarify a bit more... I now have a code that does save_FIELD_file, giving it file contents as a string. It fails with your patch since this method expects an UploadedFile instance instead. Are you going to leave it as a documented incompatibility or it can be dealt with?
comment:284 by , 17 years ago
A little bug in the current version of the patch. In the method chunk
of the class UploadedFile
there is a call to the method file_size
which is not a method, as it is defined as a property, and hence overwritten by this property. The property should be used insted of the method and the method file_size
should be removed, as it serves no purpose. The call to file_size generates an exception TypeError: 'int' object is not callable
, which again makes chunk
unusable on TemporaryUploadedFile
.
by , 17 years ago
Attachment: | 2070_revision7484.diff added |
---|
Updated diff per Jacob's requests and useful feedback.
follow-up: 286 comment:285 by , 17 years ago
Hi everyone,
Thank you so much for the tremendous feedback!
A few things to note about the new patch:
save_FIELD_file
was supposed to be backwards compatible. It is now, only emittingDeprecationWarning
warnings when you do so.- The
file_size
attribute is no longer a function nor referenced as a function. - The
StopUpload
interface now works, with the optionalconnection_reset=True
parameter to allow the connection to reset. - I think I've gone over all of Jacob's suggestions and fixed each of the problems.
I have a few things I want to do still that don't have to do with programming:
- Outline where I have tested this (which platforms, what testing cases).
- Enumerate every single backwards incompatibility (even
DeprecationWarning
emissions), and explain why they are there. - Respond to each of Jacob's comments.
Thanks again for your feedback!
follow-up: 287 comment:286 by , 17 years ago
Replying to axiak:
- The
file_size
attribute is no longer a function nor referenced as a function.
I'm new to this patch, having just needed it today, but I think this may be causing problems with InMemoryUploadedFile
. With no file_size
, FileField.clean()
fell through to the dictionary usage, which failed at file_name = data.get('filename')
. Changing that to file_name = data['filename']
avoided the failed validation, and adding a file_size
attribute to {{InMemoryUploadedFile}}} (with a quick len/seek at the end of the constructor) seems to have fixed the attribute validation.
by , 17 years ago
Attachment: | 2070_revision7484.2.diff added |
---|
Fixed file_size bug with memory handling.
follow-up: 289 comment:287 by , 17 years ago
Replying to john:
(with a quick len/seek at the end of the constructor) seems to have fixed the attribute validation.
Doh. The regression is fixed in the latest patch (of the same name). Although the solution is not to do another len/seek but to update the handler to use the file_size counter.
Anyway, this teaches me that I have to think about clever ways of leveraging the TestClient
to catch these types of errors.
Cheers,
Mike
comment:289 by , 17 years ago
Replying to axiak:
Doh. The regression is fixed in the latest patch (of the same name). Although the solution is not to do another len/seek but to update the handler to use the file_size counter.
Yep, I just tested a clean checkout and it's working fine now without any of my cheesy hacks. ;) Thanks.
comment:290 by , 17 years ago
Cc: | removed |
---|
comment:291 by , 17 years ago
Cc: | added |
---|
comment:292 by , 17 years ago
Cc: | added |
---|
comment:293 by , 17 years ago
Cc: | added |
---|
comment:294 by , 17 years ago
I would find it very useful to have the UploadedFile
cary a tell()
method. It basiclly just needs to propagate the call forward to file
and would help to complete the file-likeness of the objects.
comment:295 by , 17 years ago
Cc: | added |
---|
This patch is awesome, many thanks to all the developers involved in building it.
I just wanted to mention that I had to replace :
if content_length <= 0:
by
if content_length < 0:
in django/http/multipartparser.py line 94 as a workaround to a Safari bug.
For some reason, Safari 3.x on Mac (not on PC) set the Content-Type to multipart/form-data on a redirect (even if it's a GET request...) but doesn't set the Content-Length.
comment:296 by , 17 years ago
Great patch, but why limit it to uploads which have multipart/form-data as their content-type?
Why not extend the mechanism to any kind of binary data upload (e.g. handling file upload using the Atom Publishing Protocol) ?
Also, http://orodruin.axiak.net/upload_handling.html doesn't seem to be up anymore. Is there another location for the API docs? Thanks, Bruno.
comment:297 by , 17 years ago
Cc: | added |
---|
by , 17 years ago
Attachment: | 2070-r7695.patch added |
---|
Streaming uploads patch updated to [7695] and reviewed.
follow-up: 299 comment:298 by , 17 years ago
I've completed a comprehensive review of all the code in the latest patch and updated it to trunk; the new patch is attached. I still need to edit and review the docs, but that's on me. Other than that, the TODO list is getting quite a bit shorter. Here's what's left before we can call for testing and merge this sucker:
The big stuff
- We need unit tests for all the various data structures introduced in django/core/files (UploadedFile, the handlers, etc.). Most of this is sorta-tested by the holistic upload tests, but unit tests are particularly needed here. In most cases the needed bits are easy to mock test.
- The file upload regression tests (currently in
regressiontests/test_client_regress
) ought to be moved to a seperate test module (file_uploads
, probably) so it can be run seperately. More tests are also needed here; coverage seems pretty low.
InMemoryUploadedFile.chunk()
usesreturn
but its parent methodUploadedFile.chunk()
is a generator. This seems very dangerous -- the API should be a function or a generator, but not either -- and should be fixed if there's not a good reason for it.
FileUPloadHandler.new_file()
something (i.e. "Stop") to stop handling. That's nasty; it should use some sort of exception to do that.
django/http/multipartparser.py:374
this should absolutely not be here. How, when and why would this happen? This absolutely needs to Just Work.
- We need an entry written for BackwardsIncompatibleChanges. This might be a big enough change that it warrents an entire writeup like NewformsAdminBranch of something. Either way, there needs to be a good guide for updating code.
Small stuff and questions
django/db/models/base.py:519
: is hardcoding 64k OK here, or should this be some sort of setting?
- Look at the use of
filterwarnings('ignore')
intests/modeltests/model_forms/models.py
} is that intentional or just papering over internal uses of the "old" APIs? Dittotests/regressiontests/forms/tests.py
.
- Dunno how I feel about the
FieldType
object -- it doesn't feel very Pythonic. Why not just assign string literals to those constants? That's how its done elsewhere in Django.
- Idiom: ParseBoundaryStream should be parse_boundary_stream since it's a function. Unless there's a reason you called it this?
follow-up: 300 comment:299 by , 17 years ago
Replying to jacob:
django/http/multipartparser.py:374
this should absolutely not be here. How, when and why would this happen? This absolutely needs to Just Work.
I authored this check. It's actually not to target any regression in particular, but I decided it would be a better option than -- in event of a parser bug -- a busy loop. The rationale is that the parser is conceptually a recursive descent parser with backtracking and that if written naively that would mean that parser bugs would manifest as a stack overflow error. The old parser had a bug like this where it would just be content to buffer as much as the stream as it wanted in memory given malformed requests (look for some of my previous posts above).
Since we want to take advantage of incremental computation and Python's iterators we lose this, as we can happily move back and forth in parser states forever in the event a bug is encountered. This 500 is a inexact proxy for the maximum stack depth before we decide we are never going to be able to make a reduction.
So far it has never been triggered (it was put in after we debugged some bugs that WOULD have caused it to trigger). You can remove it if you decide maybe-busy-looping is better than an exception.
Also, it's possible to defeat this heuristic by having a ring-oscillation of sorts where different values are popped off and put back onto the LazyStream. This is a much more complex class of bug and also less likely given the architecture of the parser. A more sound way to accomplish a recursion-depth-limit would be to count the depth of our parser state, and perhaps we should do that instead, even if it is somewhat more complicated.
So in conclusion: this check is not there because "there are some known issues we haven't ironed out and we need a catch-all," but more like "we don't have any known issues, but would rather not occupy 100% cpu if there are any regressions following a similar pattern of ones we've seen previously"
follow-up: 302 comment:300 by , 17 years ago
Replying to fdr:
I authored this check. It's actually not to target any regression in particular, but I decided it would be a better option than -- in event of a parser bug -- a busy loop. The rationale is that the parser is conceptually a recursive descent parser with backtracking and that if written naively that would mean that parser bugs would manifest as a stack overflow error. The old parser had a bug like this where it would just be content to buffer as much as the stream as it wanted in memory given malformed requests (look for some of my previous posts above).
Ah, OK -- makes sense. So the only way this would get triggered would be a maliciously malformed request package, right?
I may just change the error message, then -- "malformed multipart request"?
comment:301 by , 17 years ago
Cc: | added |
---|
comment:302 by , 17 years ago
Replying to jacob:
Ah, OK -- makes sense. So the only way this would get triggered would be a maliciously malformed request package, right?
That's right. The malformed request has to be strange in a way not seen or thought of. It could also potentially come up as a very obscure correct payload.
I may just change the error message, then -- "malformed multipart request"?
Sure, that's would seem perfectly fine. But pay in mind the following: because it is triggered in a very exceptional case (we do advertise handling bad inputs in a graceful way) I'd suggest something more strongly worded to report a bug is called for so it can be squashed.
This check is not a fallback. It a last resort, and in an ideal world would never be triggered. It's just a hopefully better alternative that doing something very wrong forever.
comment:303 by , 17 years ago
Since uploading patches is getting unwheidly, I've started working on this in a git branch: http://code.djangoproject.com/git/?p=django;a=shortlog;h=refs/heads/2070-streaming-uploads (check out from git://djangoproject.com/django and use the "2070-streaming-uploads" branch).
comment:305 by , 17 years ago
Cc: | added |
---|---|
Keywords: | sprstreaming intsept14 added; streaming sprintsept14 removed |
comment:306 by , 17 years ago
Keywords: | streaming sprintsept14 added; sprstreaming intsept14 removed |
---|
comment:307 by , 17 years ago
Cc: | removed |
---|
comment:308 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
I've uploaded what I think is a finished patch. It's also available from git: http://code.djangoproject.com/git/?p=django;a=shortlog;h=refs/heads/2070-streaming-uploads (clone from git://djangoproject.com/django and checkout the 2070-streaming-uploads branch).
Public testing is needed, but once its been kicked around some this'll get merged to trunk.
comment:309 by , 17 years ago
Cc: | added |
---|
comment:310 by , 17 years ago
Cc: | removed |
---|
comment:311 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [7814]) Fixed #2070: refactored Django's file upload capabilities.
A description of the new features can be found in the new upload handling documentation; the executive summary is that Django will now happily handle uploads of large files without issues.
This changes the representation of uploaded files from dictionaries to bona fide objects; see BackwardsIncompatibleChanges for details.
comment:313 by , 12 years ago
Easy pickings: | unset |
---|---|
UI/UX: | unset |
Should there be a setting to enable/disable this?
Needs testing on python 2.3.