Code

Opened 6 years ago

Closed 6 years ago

#7593 closed (fixed)

Temporary uploaded file API (TemporaryFile) is inadequate

Reported by: vomjom Owned by: jacob
Component: Uncategorized Version: master
Severity: Keywords:
Cc: vomjom@…, mike@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

The temporary file is not properly deleted much of the time for me. There's already a known issue where if you kill/restart apache, the temporary files will stay around forever (at least until system reboot), but if you also fork a process and pass the files to it, they stick around. I don't understand why there's a TemporaryFile class inside of uploadhandler.py when it could just use tempfile.TemporaryFile (which unlinks the file immediately after creation, rather than at garbage collection).

The other issue with the uploadhandler.TemporaryFile class is that it's not iterable (it defines iter via its getattr, but that doesn't work), so it doesn't act like a real file object. This is an issue if you try to use some python libraries with the uploaded file, e.g. csv. This again would be fixed if tempfile.TemporaryFile was used.

Also, it's a bit strange that the UploadedFile API doesn't offer direct access to the file object. It seems to me that a very common use-case for upload handling is to process the file using python's standard library. Two examples I currently use in my program are zipfile and csv. Both require access to the underlying file object. It seems especially strange because the UploadedFile has a TemporaryFile object that it could return (although as I mentioned above, it is inadequate as a replacement for a file object). When handling and processing huge uploads, it doesn't seem right to force the user to chunk the UploadedFile to a new file (which much of the time will be temporary itself) to be able to open it in a standard python libs.

Attachments (2)

7814_cleanups.2.diff (25.0 KB) - added by axiak 6 years ago.
Overall cleanups for r7814...addressing all API changes that I know of.
7814_cleanups.diff (25.5 KB) - added by axiak 6 years ago.
Update to the patch...thanks vomjom!

Download all attachments as: .zip

Change History (15)

comment:1 Changed 6 years ago by jacob

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to jacob
  • Patch needs improvement unset
  • Status changed from new to assigned
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 6 years ago by jacob

  • Summary changed from Patch 2070 issues to Temporary uploaded file API (TemporaryFile) is inadequate

comment:3 Changed 6 years ago by axiak

  • Has patch set

Jonathan,

Since the UploadedFile objects aren't necessarily files on the file system, we cannot just provide all of the methods that are available to the same object as open('myfile', 'r').

However, that doesn't mean we can't try to provide as much as possible. It's just that the more we provide, the more limiting we are on what UploadedFile can do -- it's a balancing act.

In this attached patch, the following changes are made, which will hopefully help what you are trying to accomplish:

  1. TemporaryUploadFile inherits from both UploadedFile and the class of the object that tempfile.NamedTemporaryFile() returns.
  2. All UploadedFile objects now support __iter__(), readlines(), and xreadlines().

In addition to this, the patch also updates some of the API and attempts to fix #7614.

Please try this patch and let me know if it solves any issue you are having.

Cheers,
Mike

comment:4 Changed 6 years ago by axiak

  • Cc mike@… added

Changed 6 years ago by axiak

Overall cleanups for r7814...addressing all API changes that I know of.

comment:5 Changed 6 years ago by vomjom

Thanks!

It looks like the patch solved my issues, except I wasn't able to completely test it.

There are a few issues in the patch that I had to resolve:
In uploadedfile.py, you have:

if settings.FILE_UPLOAD_TEMP_DIR:

It will fail with 'module' object has no attribute 'FILE_UPLOAD_TEMP_DIR'. You should probably do getattr(settings, 'FILE_UPLOAD_TEMP_DIR', None) instead.

I think the same applies to settings.FILE_UPLOAD_MAX_MEMORY_SIZE

I also get an error for temporary files

  217.         obj.__class__ = type('TemporaryUploadedFile', (obj.__class__, UploadedFile), obj.__dict__)

Exception Type: TypeError at /db/job/2/run/
Exception Value: __class__ must be set to a class

I couldn't figure out how to resolve this.

In the code

        return InMemoryUploadedFile(
            file = self.file,
            field_name = self.field_name,
            file_name = self.file_name,
            content_type = self.content_type,
            file_size = file_size,
            charset = self.charset
        )

file_name should be name and file_size should be size, otherwise it errors out.

comment:6 Changed 6 years ago by mtredinnick

Both FILE_UPLOAD_* settings constants exist in django/conf/global_settings.py. So you shouldn't ever see "module has no attribute" errors unless your Django installation isn't correct or you're manually configuring settings and didn't pass in a default (in which case the error is in your code, not Django's).

comment:7 follow-up: Changed 6 years ago by Jonathan Hseu <vomjom@…>

Well, then the specific problem is that django.core.files.uploadedfile.py does:

import settings

Rather than

from django.conf import settings

comment:8 in reply to: ↑ 7 ; follow-up: Changed 6 years ago by axiak

Replying to Jonathan Hseu <vomjom@gmail.com>:

Well, then the specific problem is that django.core.files.uploadedfile.py does:

import settings

Rather than

from django.conf import settings

I noticed this too a couple days ago, but unfortunately haven't been in internet contact.

As for the type() usage, I noticed that the methods dictionary should be {}, but that shouldn't cause the problem you're seeing... what version of Python are you using?

comment:9 in reply to: ↑ 8 Changed 6 years ago by Jonathan Hseu <vomjom@…>

Replying to axiak:

As for the type() usage, I noticed that the methods dictionary should be {}, but that shouldn't cause the problem you're seeing... what version of Python are you using?

2.5.2. I tried reproducing it, and this is what I've found:

>>> class A(object):
...   pass
... 
>>> a = A()
>>> a.__class__ = type('foo', (object,), {})
>>> a
<__main__.foo object at 0x7f5d3e62e4d0>
>>> class B:
...   pass
... 
>>> b = B()
>>> b.__class__ = type('bar', (object,), {})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __class__ must be set to a class

So I imagine the problem is:

>>> import tempfile
>>> f = tempfile.NamedTemporaryFile(mode='w', suffix='.upload')
>>> f.__class__
<class tempfile._TemporaryFileWrapper at 0x7f414f12b1d0>

And in my tempfile.py, _TemporaryFileWrapper has no parent class.

Changed 6 years ago by axiak

Update to the patch...thanks vomjom!

comment:10 Changed 6 years ago by jacob

  • Resolution set to duplicate
  • Status changed from assigned to closed

This is being fixed with the same patch as #7614, so I'm marking this as a duplicate. It's not, exactly, but it'll get fixed at the same time.

comment:11 Changed 6 years ago by jacob

(In [7859]) Fixed #7614: the quickening has come, and there now is only one UploadedFile. On top of that, UploadedFile's interface has been improved:

  • The API now more closely matches a proper file API. This unfortunately means a few backwards-incompatible renamings; see BackwardsIncompatibleChanges. This refs #7593.
  • While we were at it, renamed chunk() to chunks() to clarify that it's an iterator.
  • Temporary uploaded files now property use the tempfile library behind the scenes which should ensure better cleanup of tempfiles (refs #7593 again).

Thanks to Mike Axiak for the bulk of this patch.

comment:12 Changed 6 years ago by Jonathan Hseu <vomjom@…>

  • Resolution duplicate deleted
  • Status changed from closed to reopened

Thanks Mike for the new SVN patch. It works great except for a few minor things:

  1. Please put a tell() function in TemporaryUploadedFile
  1. Please also map the file functions to the StringIOs in InMemoryUploadedFile and SimpleUploadedFile.

I currently have an awkward situation where my project works well if the file is large enough (and so goes to a temporary file), but fails on small files (and becomes an InMemoryUploadedFile). It also fails in my unit tests (SimpleUploadedFile).

comment:13 Changed 6 years ago by jacob

  • Resolution set to fixed
  • Status changed from reopened to closed

#7769 is tracking these improvements. Jonathan, in general, please don't reopen fixed tickets to add more patches; new tickets are far preferred.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.