Opened 5 years ago

Last modified 4 years ago

#14063 new New feature

Validating form file fields is hard

Reported by: olau Owned by: nobody
Component: File uploads/storage Version: 1.2
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I have an application where I need to validate a file uploaded through a form. I'd like to do this with the clean_xxx() method for the field. However, I need the file to be present in the file system because the validation is using external libraries and applications.

I don't think that's an unreasonable requirement, but it is not straightforward with the current API. After some digging, I thought I could use the storage classes, something like

f = self.cleaned_data['myfile']
storage = FileSystemStorage(location="/tmp")
path = storage.save(tempfile.mktemp(dir="/tmp"), f)
try:
    if not validate_file(path):
        raise forms.ValidationError(...)
finally:
    storage.delete(path)

But this doesn't work with the temporary file backend because it opens and closes the file, and as far as I can tell, there's no straight-forward way of reopening it again. So currently, I'm doing something like this:

f = self.cleaned_data['myfile']
if hasattr(f, 'temporary_file_path'):
    path = f.temporary_file_path()
    cleanup = False
else:
    import tempfile
    path = tempfile.mktemp(dir="/tmp")
    from django.core.files.storage import FileSystemStorage
    storage = FileSystemStorage(location="/tmp")
    path = storage.save(path, f)
    cleanup = True
    
try:
    if not validate_file(path):
        raise forms.ValidationError(...)
finally:
    if cleanup:
        storage.delete(path)

It occurs to me that the API ought to have a more elegant way of doing this?

Change History (8)

comment:1 Changed 5 years ago by SmileyChris

  • Component changed from Uncategorized to File uploads/storage
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

You're right, it is hard.

A http://docs.python.org/library/tempfile.html#tempfile.NamedTemporaryFile with the delete argument set to False seems like the behavior you're after, but unfortunately that argument was introduced in 2.6.

comment:2 Changed 4 years ago by julien

  • Severity set to Normal
  • Type set to New feature

comment:3 Changed 4 years ago by andrewgodwin

  • Easy pickings unset
  • UI/UX unset

Why do you need to close the file to validate it in the first place? If you're shelling out, it's not necessary, and I would presume that any Python-based validation could be convinced not to close it.

comment:4 Changed 4 years ago by olau

I don't quite understand the question. It seems you think there are no instances where one could possibly need a file name as handle for external interfaces (I don't understand why you are talking about closing the file, though, I don't need to close it, just need to have it exist in the file system so I can plug in some already working functionality to test it).

Digging through my source code, it seems like the problem that triggered this report occurs with VIPS (http://www.vips.ecs.soton.ac.uk/index.php?title=VIPS) which is a C imaging library with an auto-generated, somewhat weird Python wrapper.

comment:5 Changed 4 years ago by andrewgodwin

Ah, yes, I see what the ticket is for now - sorry, we got a bit confused about it at the sprint.

Have you tried accessing the native file pointer at self.cleaned_data['myfile'].file directly? It shouldn't be closed at that point; you might need to set FILE_UPLOAD_MAX_MEMORY_SIZE = 0 to force it to be an actual on-disk file.

comment:6 Changed 4 years ago by olau

I think that's basically what the work-around snippet I posted does in a different way through temporary_file_path? Or do you have something else in mind?

I'd rather not want to mess with a global setting like FILE_UPLOAD_MAX_MEMORY_SIZE.

comment:7 Changed 4 years ago by olau

By the way, temporary_file_path() is just what I need, if memory file things had that too, I would be a happy camper. I realize that requires turning them into temporary files or cleaning up somehow, though.

comment:8 Changed 4 years ago by andrewgodwin

Yeah, that's essentially your workaround.

I think what might be the best thing here is to have temporary_file_path, or some nicer equivalent, on all file objects, and have it work on in-memory types too, using tempfile. Alternatively, have some way of marking that a certain FileField never wants in-memory files (i.e. avoiding the global setting), which is nice because it requires less code.

Note: See TracTickets for help on using tickets.
Back to Top