Opened 10 years ago

Last modified 3 days ago

#27150 assigned Cleanup/optimization

Wrong File object behavior if file object has no name attribute

Reported by: Sergei Zh Owned by: VIZZARD-X
Component: File uploads/storage Version: 1.10
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

django.core.files.base.File supports file-like objects (synonym to file object in python glossary) and in fact can't rely on additional name attribute, but in current implementation of __bool__ method it does:

    def __bool__(self):
        return bool(self.name)

For example, until tempfile.SpooledTemporaryFile isn't large enough to have been written to disk, his name attribute is None and if statement will return False.

import tempfile
from django.core.files.base import File
python_file_like_object = tempfile.SpooledTemporaryFile()
file_obj = File(python_file_like_object)  # name keyword is None by default
if file_object:  # Falsy, because python_file_like_object.name is None
    print("42")

The same behavior with streams (BytesIO, StringIO).
Maybe __bool__ method should rely on self.file attribute to avoid this behavior?

Change History (18)

comment:1 by Baptiste Mispelon, 10 years ago

Hi,

As far as I can tell, File.__bool__ is not documented but it does seem to be used internally and is somewhat tested (though maybe not directly): changing it to return False yields quite a few errors and failures in the test suite.

Because of this lack of documentation, it's hard to tell if the behavior you're describing is a bug.

I think it'd be worth it to:

1) Document File.__bool__ (both in the reference docs and in the docstring)
2) Figure out if we can improve it by handling underlying files with no names as you suggest.

What do you think?

comment:2 by Sergei Zh, 10 years ago

Hi.
Sure, it makes sense. I'll try to investigate failed tests ASAP. Think it's the best place to begin.

comment:3 by Tim Graham, 10 years ago

Component: Core (Other)Documentation
Summary: Wrong File object behavior if file object has no name attributeDocument that a name should be provided when wrapping file-like objects that don't have one with File
Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization

See #26495 for a case where this came up. Unless investigation yields otherwise, I think documenting that you should pass a name for file-like objects that don't have one is the best course of action. I'm not sure if raising an error for that case might be acceptable (there might be use cases where providing name isn't needed) but if so, they would likely be even more helpful.

comment:4 by Sergei Zh, 10 years ago

As I understood from #26495 python file-like objects works fine with requests library, because in CPython file objects use object.__bool__ and always returns True if no __bool__ and __len__.

I think documenting that you should pass a name for file-like objects that don't have one is the best course of action.

It's little bit strange for me to allow pass name=None to Storage.save, while disallow use empty name for file-like objects. Maybe we can restore default __bool__ behavior in Storage.save method?

content.__bool__ = lambda _: True

comment:5 by Sergei Zh, 10 years ago

I've realized it's wrong, we should delete __bool__ method, cause default implementation depends on __len__ method.

comment:6 by Tim Graham, 10 years ago

I haven't looked into the issue so I can't propose an alternative, however, I'm not sure that patching methods like that is a good practice. At least, I'm not aware of anywhere else in Django that uses that pattern.

comment:7 by Sergei Zh, 10 years ago

You are right, I guess the main reason - it doesn't work :)

comment:8 by VIZZARD-X, 4 months ago

PR submitted: https://github.com/django/django/pull/20327

Updated the documentation to clarify that wrapping file-like objects
without a name attribute requires explicitly providing one. The docs
now build cleanly, including lint and spelling checks.

comment:9 by VIZZARD-X, 4 months ago

Has patch: set

comment:10 by Vijaya Lakshmi Pokala, 7 weeks ago

Hi, I'd like to work on this ticket and submit a patch soon

comment:11 by blighj, 2 weeks ago

Triage Stage: AcceptedUnreviewed

A 10 year old ticket that never really reached consensus may not be a good ticket to pick up and introduce new wording to the docs.

Reading between the lines it feels like this is one where someone needs to do a bit of work investigating deeper. Defining a base File class and then giving it a __bool__ method that relies on a name seems like a break in the abstraction.

I amended the __bool__ call to log its usage and ran the test suite, and then did some searching to see what other cases I could see. I think this is pretty comprehensive. I've given the intent of the calling code, why does it want to know if the file is true, and what type of file is it working with currently.

Location Calls Intent Instance(s)
fields/files.py:336 pre_save 847 Skip committing the file if the field is empty (name is the DB value; empty name means nothing to save) FieldFile
fields/files.py:42 _require_file 85 Guard: raise ValueError if no file is associated with this field before attempting file operations FieldFile
http/multipartparser.py:374 handle_file_complete 59 Check whether an upload handler returned a completed file object (vs None meaning not yet done) * InMemoryUploadedFile, TemporaryUploadedFile
forms/fields.py:696 FileField.clean 37 Fall back to the initial value if no new file was uploaded FieldFile, SimpleUploadedFile
forms/widgets.py:552 ClearableFileInput.is_initial 24 Determine whether the current value is an existing stored file (to decide whether to render the "currently" link) FieldFile, SimpleUploadedFile
fields/files.py:368 save_form_data 11 Use empty string if no file was provided (data or "") to avoid storing False in the DB column FieldFile, SimpleUploadedFile
fields/files.py:109 delete 9 Early return if no file is associated with the field — nothing to delete FieldFile
forms/widgets.py:521 FileInput.use_required_attribute 5 Suppress the HTML required attribute if a file is already set as the initial value FieldFile
forms/widgets.py:585 ClearableFileInput.value_from_datadict 2 Detect contradiction: user uploaded a new file AND checked the clear checkbox simultaneously SimpleUploadedFile
contrib/admin/utils.py:463 display_for_field 2 Only render a hyperlink to the file if a file is actually set on the field FieldFile
fields/files.py:500 update_dimension_fields 0 Skip reading image dimensions if no file is associated with the ImageField FieldFile
fields/files.py:518 update_dimension_fields 0 Read width/height from the image if a file is set; clear dimension fields to None otherwise FieldFile
core/files/base.py:23 File.__repr__ 0 Presentational: display the file's name in repr output, or "None" if no name is set

In reality the cases are checking either a FieldFile or a subclass of UploadedFile, in those cases it makes sense to check the name because if you have no name, then the row in the db is empty or the uploadedfile didn't work. We want to keep that behaviour.

But the default File looks like a thin wrapper around a python File, and the docs support this https://docs.djangoproject.com/en/6.0/ref/files/file/#django.core.files.File and have examples of constructing a file without a name, which makes sense for a thin wrapper, but results in a False file https://docs.djangoproject.com/en/6.0/topics/files/#the-file-object and you get bugs like #26495

So what I think we should do is have __bool__ on File return True, which would make sense for a wrapper class, files are kind of truthy by default in python, it's what ContentFile does already. But then add __bool__ methods for FieldFile and UploadedFile that rely on the name. This preserves the behaviour that most of the code is working with and makes sense for those classes, no name = not a file = no need to check the underlying file object.

That leaves that __repr__ method, which should probably be updated to return "<%s: %s>" % (self.__class__.__name__, self.name or "None")

The docs don't really then need updating at all, you could make a case for documenting the __bool__ behaviour of FieldFile and UploadedFile but I think you'd be fine without it.

I think we should close and pause the current PR and instead I've moved this ticket back to unreviewed, so a member of the triage team can have a think about it.


* The intent in handle_file_complete is genuinely is not None, the upload handler API documents that handlers return None to signal "not done yet". This case works incidentally, and would continue to work with the __bool__ changes. Maybe that line could be if file_obj is not None: as a separate/additional cleanup.

Last edited 2 weeks ago by blighj (previous) (diff)

comment:12 by blighj, 2 weeks ago

Component: DocumentationCore (Other)
Summary: Document that a name should be provided when wrapping file-like objects that don't have one with FileWrong File object behavior if file object has no name attribute

comment:13 by Jacob Walls, 11 days ago

Component: Core (Other)File uploads/storage
Triage Stage: UnreviewedAccepted

Marvelous work, James. I find your proposal in comment:11 convincing. I'm not certain whether we should document the change as a minor backward-incompatible change, on account of an application subclassing the thin File wrapper. We would be changing the semantic.

comment:14 by VIZZARD-X, 11 days ago

Owner: changed from nobody to VIZZARD-X
Status: newassigned

comment:15 by blighj, 9 days ago

Patch needs improvement: set

comment:16 by VIZZARD-X, 4 days ago

Patch needs improvement: unset

comment:17 by blighj, 4 days ago

Triage Stage: AcceptedReady for checkin

comment:18 by blighj, 3 days ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted
Note: See TracTickets for help on using tickets.
Back to Top