Opened 16 years ago

Closed 16 years ago

#7593 closed (fixed)

Temporary uploaded file API (TemporaryFile) is inadequate

Reported by: vomjom Owned by: Jacob
Component: Uncategorized Version: dev
Severity: Keywords:
Cc: vomjom@…, mike@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The temporary file is not properly deleted much of the time for me. There's already a known issue where if you kill/restart apache, the temporary files will stay around forever (at least until system reboot), but if you also fork a process and pass the files to it, they stick around. I don't understand why there's a TemporaryFile class inside of uploadhandler.py when it could just use tempfile.TemporaryFile (which unlinks the file immediately after creation, rather than at garbage collection).

The other issue with the uploadhandler.TemporaryFile class is that it's not iterable (it defines iter via its getattr, but that doesn't work), so it doesn't act like a real file object. This is an issue if you try to use some python libraries with the uploaded file, e.g. csv. This again would be fixed if tempfile.TemporaryFile was used.

Also, it's a bit strange that the UploadedFile API doesn't offer direct access to the file object. It seems to me that a very common use-case for upload handling is to process the file using python's standard library. Two examples I currently use in my program are zipfile and csv. Both require access to the underlying file object. It seems especially strange because the UploadedFile has a TemporaryFile object that it could return (although as I mentioned above, it is inadequate as a replacement for a file object). When handling and processing huge uploads, it doesn't seem right to force the user to chunk the UploadedFile to a new file (which much of the time will be temporary itself) to be able to open it in a standard python libs.

Attachments (2)

7814_cleanups.2.diff (25.0 KB ) - added by Michael Axiak 16 years ago.
Overall cleanups for r7814...addressing all API changes that I know of.
7814_cleanups.diff (25.5 KB ) - added by Michael Axiak 16 years ago.
Update to the patch...thanks vomjom!

Download all attachments as: .zip

Change History (15)

comment:1 by Jacob, 16 years ago

Owner: changed from nobody to Jacob
Status: newassigned
Triage Stage: UnreviewedAccepted

comment:2 by Jacob, 16 years ago

Summary: Patch 2070 issuesTemporary uploaded file API (TemporaryFile) is inadequate

comment:3 by Michael Axiak, 16 years ago

Has patch: set

Jonathan,

Since the UploadedFile objects aren't necessarily files on the file system, we cannot just provide all of the methods that are available to the same object as open('myfile', 'r').

However, that doesn't mean we can't try to provide as much as possible. It's just that the more we provide, the more limiting we are on what UploadedFile can do -- it's a balancing act.

In this attached patch, the following changes are made, which will hopefully help what you are trying to accomplish:

  1. TemporaryUploadFile inherits from both UploadedFile and the class of the object that tempfile.NamedTemporaryFile() returns.
  2. All UploadedFile objects now support __iter__(), readlines(), and xreadlines().

In addition to this, the patch also updates some of the API and attempts to fix #7614.

Please try this patch and let me know if it solves any issue you are having.

Cheers,
Mike

comment:4 by Michael Axiak, 16 years ago

Cc: mike@… added

by Michael Axiak, 16 years ago

Attachment: 7814_cleanups.2.diff added

Overall cleanups for r7814...addressing all API changes that I know of.

comment:5 by vomjom, 16 years ago

Thanks!

It looks like the patch solved my issues, except I wasn't able to completely test it.

There are a few issues in the patch that I had to resolve:
In uploadedfile.py, you have:

if settings.FILE_UPLOAD_TEMP_DIR:

It will fail with 'module' object has no attribute 'FILE_UPLOAD_TEMP_DIR'. You should probably do getattr(settings, 'FILE_UPLOAD_TEMP_DIR', None) instead.

I think the same applies to settings.FILE_UPLOAD_MAX_MEMORY_SIZE

I also get an error for temporary files

  217.         obj.__class__ = type('TemporaryUploadedFile', (obj.__class__, UploadedFile), obj.__dict__)

Exception Type: TypeError at /db/job/2/run/
Exception Value: __class__ must be set to a class

I couldn't figure out how to resolve this.

In the code

        return InMemoryUploadedFile(
            file = self.file,
            field_name = self.field_name,
            file_name = self.file_name,
            content_type = self.content_type,
            file_size = file_size,
            charset = self.charset
        )

file_name should be name and file_size should be size, otherwise it errors out.

comment:6 by Malcolm Tredinnick, 16 years ago

Both FILE_UPLOAD_* settings constants exist in django/conf/global_settings.py. So you shouldn't ever see "module has no attribute" errors unless your Django installation isn't correct or you're manually configuring settings and didn't pass in a default (in which case the error is in your code, not Django's).

comment:7 by Jonathan Hseu <vomjom@…>, 16 years ago

Well, then the specific problem is that django.core.files.uploadedfile.py does:

import settings

Rather than

from django.conf import settings

in reply to:  7 ; comment:8 by Michael Axiak, 16 years ago

Replying to Jonathan Hseu <vomjom@gmail.com>:

Well, then the specific problem is that django.core.files.uploadedfile.py does:

import settings

Rather than

from django.conf import settings

I noticed this too a couple days ago, but unfortunately haven't been in internet contact.

As for the type() usage, I noticed that the methods dictionary should be {}, but that shouldn't cause the problem you're seeing... what version of Python are you using?

in reply to:  8 comment:9 by Jonathan Hseu <vomjom@…>, 16 years ago

Replying to axiak:

As for the type() usage, I noticed that the methods dictionary should be {}, but that shouldn't cause the problem you're seeing... what version of Python are you using?

2.5.2. I tried reproducing it, and this is what I've found:

>>> class A(object):
...   pass
... 
>>> a = A()
>>> a.__class__ = type('foo', (object,), {})
>>> a
<__main__.foo object at 0x7f5d3e62e4d0>
>>> class B:
...   pass
... 
>>> b = B()
>>> b.__class__ = type('bar', (object,), {})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __class__ must be set to a class

So I imagine the problem is:

>>> import tempfile
>>> f = tempfile.NamedTemporaryFile(mode='w', suffix='.upload')
>>> f.__class__
<class tempfile._TemporaryFileWrapper at 0x7f414f12b1d0>

And in my tempfile.py, _TemporaryFileWrapper has no parent class.

by Michael Axiak, 16 years ago

Attachment: 7814_cleanups.diff added

Update to the patch...thanks vomjom!

comment:10 by Jacob, 16 years ago

Resolution: duplicate
Status: assignedclosed

This is being fixed with the same patch as #7614, so I'm marking this as a duplicate. It's not, exactly, but it'll get fixed at the same time.

comment:11 by Jacob, 16 years ago

(In [7859]) Fixed #7614: the quickening has come, and there now is only one UploadedFile. On top of that, UploadedFile's interface has been improved:

  • The API now more closely matches a proper file API. This unfortunately means a few backwards-incompatible renamings; see BackwardsIncompatibleChanges. This refs #7593.
  • While we were at it, renamed chunk() to chunks() to clarify that it's an iterator.
  • Temporary uploaded files now property use the tempfile library behind the scenes which should ensure better cleanup of tempfiles (refs #7593 again).

Thanks to Mike Axiak for the bulk of this patch.

comment:12 by Jonathan Hseu <vomjom@…>, 16 years ago

Resolution: duplicate
Status: closedreopened

Thanks Mike for the new SVN patch. It works great except for a few minor things:

  1. Please put a tell() function in TemporaryUploadedFile
  1. Please also map the file functions to the StringIOs in InMemoryUploadedFile and SimpleUploadedFile.

I currently have an awkward situation where my project works well if the file is large enough (and so goes to a temporary file), but fails on small files (and becomes an InMemoryUploadedFile). It also fails in my unit tests (SimpleUploadedFile).

comment:13 by Jacob, 16 years ago

Resolution: fixed
Status: reopenedclosed

#7769 is tracking these improvements. Jonathan, in general, please don't reopen fixed tickets to add more patches; new tickets are far preferred.

Note: See TracTickets for help on using tickets.
Back to Top