#35818 closed Bug (wontfix)
Failing to save file with long names containing dots
Reported by: | Bruno Alla | Owned by: | Bruno Alla |
---|---|---|---|
Component: | File uploads/storage | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
This started to happen as we updated to Djanho 5.1. We started seeing some SuspiciousFileOperation
errors when our users were trying to upload long file names and wasn't initially clear why it started to happen only recently.
After further investigation, it's only a problem when the file name contains a "." in the middle name, and which point the truncation logic trims too many characters, and end up with no base name on this line: https://github.com/django/django/blob/6bedb102e9708c6183caa51330f9bdeddf944d6a/django/core/files/storage/base.py#L106-L111
Here is a minimal reproduction:
# models.py class Document(models.Model): file = models.FileField(upload_to="documents/") # tests.py class TestDocument(TestCase): def test_save_file(self): file_name = "this.is.a.very.l" + "o" * 100 + ".txt" Document.objects.create(file=SimpleUploadedFile(name=file_name, content=b"test"))
I created a GitHub repo based off startproject with that code to make it easier to run: https://github.com/browniebroke/django-suspicious-filename-too-long-with-dot
The test passes on Django 5.0 but fails on Django 5.1 with the following exception:
django.core.exceptions.SuspiciousFileOperation: Storage can not find an available filename for "documents/this_d01Yq4J.is.a.very.loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo.txt". Please make sure that the corresponding file field allows sufficient "max_length".
From what I can tell, the bug starts on line 87, when we try to get the file extension: https://github.com/django/django/blob/6bedb102e9708c6183caa51330f9bdeddf944d6a/django/core/files/storage/base.py#L87
On the next line, the extension is removed from the name to get the file root, which removes a lot more characters than expected, as the extension starts at the first ".", instead of the last one.
Change History (9)
comment:1 by , 2 months ago
Description: | modified (diff) |
---|
comment:2 by , 2 months ago
Has patch: | set |
---|---|
Owner: | set to |
Patch needs improvement: | set |
Status: | new → assigned |
comment:3 by , 2 months ago
comment:4 by , 2 months ago
Patch needs improvement: | unset |
---|
comment:5 by , 2 months ago
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
Hello Bruno, thank you for your report and for providing the repro project; I appreciate it. I’ve investigated the issue and reviewed your PR.
My conclusion is that the key concern here is how we define "all the file extensions" for a filename. The proposal in your PR, which suggests that any suffix shorter than 5 characters should be considered an extension, seems arbitrary and unpredictable. I'm not comfortable adding such a rule to Django core. Currently, Django follows the logic established by pathlib, which defines suffixes as everything after the first dot.
Implementing any other algorithm would necessitate a comprehensive list of file extensions, which may not be practical (or possible!).
I understand that the current path-splitting logic may not align with your project's specific needs, but I believe it does not apply to the broader Django ecosystem. Django is designed to provide robust solutions for common scenarios. Your project could achieve the desired behavior by either increasing the max_length
of a FileField
, validating filenames to restrict dots, or by renaming files proactively. You could also provide your custom Storage
class.
Given the above, I'll close the ticket accordingly, but if you disagree, you can consider starting a new conversation on the Django Forum, where you'll reach a wider audience and likely get extra feedback. If there is a community conversation and consensus on a suitable solution is reached, please let me know and I'll be happy to re-open pointing to the Forum topic.
comment:6 by , 2 months ago
Fair enough. Youre're right, I agree that my fix is too brittle and doesn't belong in Django. I wish pathlib would offer something more robust here...
However, I feel there might be something else which is wrong in the current Django algorithm, and I hope we can handle this better, so I'll follow your recommendation and reach out to the forum.
comment:7 by , 2 months ago
You could also provide your custom Storage class.
I think that's a reasonable option to offer to users. However there is currently not a good hook to easily override the storage extension splitting logic, users will -I think- have to override the whole get_available_name
method, which is quite long and complex.
Carlton suggested to make this easier to customise in user-land, which seem to strike a good balance between the maintainablity of Django and the variety of use cases faced by users.
Would you be open to re-open this ticket to try to add such hook, along the lines of something like get_file_extensions()
?
comment:8 by , 8 weeks ago
Hey Bruno, before considering re-opening this ticket, I think we need to continue the conversation in the forum topic about possible API improvements in the Storage
base class. Thank you!
comment:9 by , 8 weeks ago
Sure, will follow up there
PS: for these interested: https://forum.djangoproject.com/t/issue-when-saving-files-with-long-names-containing-dot-in-django-5-1/35494
Just noting that this is a side effect of https://code.djangoproject.com/ticket/23759, where we try to preserve all extensions, for instance if the file is
test.tar.gz
.