Opened 2 months ago

Closed 2 months ago

Last modified 8 weeks ago

#35818 closed Bug (wontfix)

Failing to save file with long names containing dots

Reported by: Bruno Alla Owned by: Bruno Alla
Component: File uploads/storage Version: 5.1
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Bruno Alla)

This started to happen as we updated to Djanho 5.1. We started seeing some SuspiciousFileOperation errors when our users were trying to upload long file names and wasn't initially clear why it started to happen only recently.

After further investigation, it's only a problem when the file name contains a "." in the middle name, and which point the truncation logic trims too many characters, and end up with no base name on this line: https://github.com/django/django/blob/6bedb102e9708c6183caa51330f9bdeddf944d6a/django/core/files/storage/base.py#L106-L111

Here is a minimal reproduction:

# models.py
class Document(models.Model):
    file = models.FileField(upload_to="documents/")


# tests.py
class TestDocument(TestCase):
    def test_save_file(self):
        file_name = "this.is.a.very.l" + "o" * 100 + ".txt"
        Document.objects.create(file=SimpleUploadedFile(name=file_name, content=b"test"))

I created a GitHub repo based off startproject with that code to make it easier to run: https://github.com/browniebroke/django-suspicious-filename-too-long-with-dot

The test passes on Django 5.0 but fails on Django 5.1 with the following exception:

django.core.exceptions.SuspiciousFileOperation: Storage can not find an available filename for "documents/this_d01Yq4J.is.a.very.loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo.txt". Please make sure that the corresponding file field allows sufficient "max_length".

From what I can tell, the bug starts on line 87, when we try to get the file extension: https://github.com/django/django/blob/6bedb102e9708c6183caa51330f9bdeddf944d6a/django/core/files/storage/base.py#L87

On the next line, the extension is removed from the name to get the file root, which removes a lot more characters than expected, as the extension starts at the first ".", instead of the last one.

Change History (9)

comment:1 by Bruno Alla, 2 months ago

Description: modified (diff)

comment:2 by Bruno Alla, 2 months ago

Has patch: set
Owner: set to Bruno Alla
Patch needs improvement: set
Status: newassigned

comment:3 by Bruno Alla, 2 months ago

Just noting that this is a side effect of https://code.djangoproject.com/ticket/23759, where we try to preserve all extensions, for instance if the file is test.tar.gz.

comment:4 by Bruno Alla, 2 months ago

Patch needs improvement: unset

comment:5 by Natalia Bidart, 2 months ago

Resolution: wontfix
Status: assignedclosed

Hello Bruno, thank you for your report and for providing the repro project; I appreciate it. I’ve investigated the issue and reviewed your PR.

My conclusion is that the key concern here is how we define "all the file extensions" for a filename. The proposal in your PR, which suggests that any suffix shorter than 5 characters should be considered an extension, seems arbitrary and unpredictable. I'm not comfortable adding such a rule to Django core. Currently, Django follows the logic established by pathlib, which defines suffixes as everything after the first dot.

Implementing any other algorithm would necessitate a comprehensive list of file extensions, which may not be practical (or possible!).

I understand that the current path-splitting logic may not align with your project's specific needs, but I believe it does not apply to the broader Django ecosystem. Django is designed to provide robust solutions for common scenarios. Your project could achieve the desired behavior by either increasing the max_length of a FileField, validating filenames to restrict dots, or by renaming files proactively. You could also provide your custom Storage class.

Given the above, I'll close the ticket accordingly, but if you disagree, you can consider starting a new conversation on the Django Forum, where you'll reach a wider audience and likely get extra feedback. If there is a community conversation and consensus on a suitable solution is reached, please let me know and I'll be happy to re-open pointing to the Forum topic.

comment:6 by Bruno Alla, 2 months ago

Fair enough. Youre're right, I agree that my fix is too brittle and doesn't belong in Django. I wish pathlib would offer something more robust here...
However, I feel there might be something else which is wrong in the current Django algorithm, and I hope we can handle this better, so I'll follow your recommendation and reach out to the forum.

comment:7 by Bruno Alla, 2 months ago

You could also provide your custom Storage class.

I think that's a reasonable option to offer to users. However there is currently not a good hook to easily override the storage extension splitting logic, users will -I think- have to override the whole get_available_name method, which is quite long and complex.

Carlton suggested to make this easier to customise in user-land, which seem to strike a good balance between the maintainablity of Django and the variety of use cases faced by users.

Would you be open to re-open this ticket to try to add such hook, along the lines of something like get_file_extensions()?

comment:8 by Natalia Bidart, 8 weeks ago

Hey Bruno, before considering re-opening this ticket, I think we need to continue the conversation in the forum topic about possible API improvements in the Storage base class. Thank you!

Note: See TracTickets for help on using tickets.
Back to Top