Opened 3 years ago

Last modified 6 months ago

#23759 assigned New feature

Storage.get_available_name should preserve all file extensions, not just the first one

Reported by: thenewguy Owned by: thenewguy
Component: File uploads/storage Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

get_available_name only preserves the first file extension. If the file is foo.tar.gz then get_available_name will return foo.tar_RANDOM.gz instead of foo_RANDOM.tar.gz

https://github.com/django/django/blob/master/django/core/files/storage.py#L71

    def get_available_name(self, name):
        """
        Returns a filename that's free on the target storage system, and
        available for new content to be written to.
        """
        dir_name, file_name = os.path.split(name)
        file_root, file_ext = os.path.splitext(file_name)
        # If the filename already exists, add an underscore and a random 7
        # character alphanumeric string (before the file extension, if one
        # exists) to the filename until the generated filename doesn't exist.
        while self.exists(name):
            # file_ext includes the dot.
            name = os.path.join(dir_name, "%s_%s%s" % (file_root, get_random_string(7), file_ext))

        return name

Change History (5)

comment:1 Changed 3 years ago by Tim Graham

In general, I think a period is a valid character in a filename, so we cannot simply assume everything to the right of the first period is the file extension. Some different strategies for the issue are detailed on stackoverflow. I am not sure its up to Django to pick one, but we could at least document the caveat and link how to write your own storage subclass so someone can implement the behavior they desire. Thoughts?

comment:2 Changed 3 years ago by thenewguy

Sorry for the delay. I just found out status updates on tickets I am watching have been going to spam. I guess that is why I never get updates.

In this case, I do not think it is an issue if a valid part of the filename is assumed to be an extension because get_available_name does not promise to return the same filename, and it doesn't matter where the random string is in the name to be "available".

However, the mimetypes module could be used to determine if an extension was actually an extension. If mimetypes returns a type or an encoding, add the extension to the extensions list, otherwise break and assume there are no more extensions.

Also, this only happens if the submitted filename already exists.

I think it would be very nice to be able to maintain reliable file extensions out of the box. This is important for many cases, including when a user downloads the file.

Here is a sample mixin I've used in the past (keep in mind it does not query mimetypes, but that is trivial)

class GetUUIDAvailableNameStorageMixin(object):
    def get_available_name(self, name):
        """
        Returns a hex encoded uuid filename, using the extensions provided by the
        input filename, that's suitable for use in the target storage system.
        """
        exts = []
        root = ext = name
        while ext:
            root, ext = splitext(root)
            exts.insert(0, ext)
        name = "%s%s" % (uuid4().hex, "".join(exts))
        return super(GetUUIDAvailableNameStorageMixin, self).get_available_name(name) 

comment:3 Changed 3 years ago by Claude Paroz

I think that if we can do it without introducing too much complexity, we should do it. Could you provide a patch with tests?

comment:4 Changed 3 years ago by thenewguy

Owner: changed from nobody to thenewguy
Status: newassigned
Triage Stage: UnreviewedAccepted

Assigned and marked accepted per previous comment.

Last edited 6 months ago by Tim Graham (previous) (diff)

comment:5 Changed 3 years ago by Tim Graham

Easy pickings: unset
Type: UncategorizedNew feature
Version: 1.7master
Note: See TracTickets for help on using tickets.
Back to Top