Opened 14 years ago

Last modified 13 years ago

#14063 new New feature

Validating form file fields is hard

Reported by: Ole Laursen Owned by: nobody
Component: File uploads/storage Version: 1.2
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I have an application where I need to validate a file uploaded through a form. I'd like to do this with the clean_xxx() method for the field. However, I need the file to be present in the file system because the validation is using external libraries and applications.

I don't think that's an unreasonable requirement, but it is not straightforward with the current API. After some digging, I thought I could use the storage classes, something like

f = self.cleaned_data['myfile']
storage = FileSystemStorage(location="/tmp")
path = storage.save(tempfile.mktemp(dir="/tmp"), f)
try:
    if not validate_file(path):
        raise forms.ValidationError(...)
finally:
    storage.delete(path)

But this doesn't work with the temporary file backend because it opens and closes the file, and as far as I can tell, there's no straight-forward way of reopening it again. So currently, I'm doing something like this:

f = self.cleaned_data['myfile']
if hasattr(f, 'temporary_file_path'):
    path = f.temporary_file_path()
    cleanup = False
else:
    import tempfile
    path = tempfile.mktemp(dir="/tmp")
    from django.core.files.storage import FileSystemStorage
    storage = FileSystemStorage(location="/tmp")
    path = storage.save(path, f)
    cleanup = True
    
try:
    if not validate_file(path):
        raise forms.ValidationError(...)
finally:
    if cleanup:
        storage.delete(path)

It occurs to me that the API ought to have a more elegant way of doing this?

Change History (8)

comment:1 by Chris Beaven, 14 years ago

Component: UncategorizedFile uploads/storage
Triage Stage: UnreviewedAccepted

You're right, it is hard.

A http://docs.python.org/library/tempfile.html#tempfile.NamedTemporaryFile with the delete argument set to False seems like the behavior you're after, but unfortunately that argument was introduced in 2.6.

comment:2 by Julien Phalip, 13 years ago

Severity: Normal
Type: New feature

comment:3 by Andrew Godwin, 13 years ago

Easy pickings: unset
UI/UX: unset

Why do you need to close the file to validate it in the first place? If you're shelling out, it's not necessary, and I would presume that any Python-based validation could be convinced not to close it.

comment:4 by Ole Laursen, 13 years ago

I don't quite understand the question. It seems you think there are no instances where one could possibly need a file name as handle for external interfaces (I don't understand why you are talking about closing the file, though, I don't need to close it, just need to have it exist in the file system so I can plug in some already working functionality to test it).

Digging through my source code, it seems like the problem that triggered this report occurs with VIPS (http://www.vips.ecs.soton.ac.uk/index.php?title=VIPS) which is a C imaging library with an auto-generated, somewhat weird Python wrapper.

comment:5 by Andrew Godwin, 13 years ago

Ah, yes, I see what the ticket is for now - sorry, we got a bit confused about it at the sprint.

Have you tried accessing the native file pointer at self.cleaned_data['myfile'].file directly? It shouldn't be closed at that point; you might need to set FILE_UPLOAD_MAX_MEMORY_SIZE = 0 to force it to be an actual on-disk file.

comment:6 by Ole Laursen, 13 years ago

I think that's basically what the work-around snippet I posted does in a different way through temporary_file_path? Or do you have something else in mind?

I'd rather not want to mess with a global setting like FILE_UPLOAD_MAX_MEMORY_SIZE.

comment:7 by Ole Laursen, 13 years ago

By the way, temporary_file_path() is just what I need, if memory file things had that too, I would be a happy camper. I realize that requires turning them into temporary files or cleaning up somehow, though.

comment:8 by Andrew Godwin, 13 years ago

Yeah, that's essentially your workaround.

I think what might be the best thing here is to have temporary_file_path, or some nicer equivalent, on all file objects, and have it work on in-memory types too, using tempfile. Alternatively, have some way of marking that a certain FileField never wants in-memory files (i.e. avoiding the global setting), which is nice because it requires less code.

Note: See TracTickets for help on using tickets.
Back to Top