Code

Opened 5 years ago

Last modified 19 months ago

#11027 assigned Bug

Storage backends should know about the max_length attribute of the FileFields

Reported by: apollo13 Owned by: melinath
Component: File uploads/storage Version: master
Severity: Normal Keywords:
Cc: jeffrey@…, hvdklauw@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently the Storage Backends may return a filename which is longer than the FileField actually supports. This of course breaks it. If the Storage knew about the maxmimum length it is allowed to returned, it could make sure, that the filenames aren't longer.

Attachments (0)

Change History (19)

comment:1 Changed 5 years ago by thejaswi_puthraya

  • Component changed from Uncategorized to File uploads/storage
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 5 years ago by Alex

  • Triage Stage changed from Unreviewed to Accepted

comment:3 Changed 4 years ago by ubernostrum

  • milestone 1.2 deleted

1.2 is feature-frozen, moving this feature request off the milestone.

comment:4 Changed 3 years ago by apollo13

  • milestone set to 1.3

comment:5 Changed 3 years ago by steph

When fixing this ticket, we should keep in mind, that the storage location (maybe a larger directory structure) needs to be considered too. If my file field has a max_length of 45, and my upload_to/storage location is "uploads/users/images/gallery/photos/", the filename has a max length of 8 (the directory name has 37 chars).

comment:6 Changed 3 years ago by steph

  • Triage Stage changed from Accepted to Design decision needed

After a short discussion with jezdez, this should be considered as DDN.

comment:7 Changed 3 years ago by hvdklauw

  • Cc hvdklauw added

comment:8 Changed 3 years ago by ramiro

#15247 has some more discussion about this topic worth reviewing. It was closed as a duplicate.

comment:9 Changed 3 years ago by jgelens

  • Cc jeffrey@… added

comment:10 Changed 3 years ago by julien

  • Severity set to Normal
  • Type set to New feature

comment:11 Changed 3 years ago by carljm

  • Easy pickings unset
  • Triage Stage changed from Design decision needed to Accepted
  • Type changed from New feature to Bug

Marking this as a bug, since #15247 makes it clear that's what it is. Discussed briefly with jezdez on IRC and concluded that the fix proposed here probably is the right one, adding a max_length argument to the get_valid_name method of storage classes. Unfortunately, this is backwards incompatible, and it won't just break code that was relying on buggy behavior, it will break all custom storage backends. So it requires a deprecation path, and during the deprecation period that will require introspecting the get_valid_name method to see whether it accepts the valid_name argument (or trying with, catching the error, and then trying without).

Either one of those is kind of ugly, but the alternative is to introduce a new method, like "get_valid_name_with_max_length". That makes the deprecation-period code a bit nicer, but then we're stuck with the longer method name for good. Probably better to take the short-term hit.

comment:12 Changed 3 years ago by hvdklauw

  • Cc hvdklauw removed
  • UI/UX unset

comment:13 Changed 3 years ago by hvdklauw

  • Cc hvdklauw@… added

comment:14 Changed 3 years ago by ishalyapin

Hey, it is not very difficult problem. I wrote a solution and use it in my projects.
It is not perfect solution, but it is much better then unhandled exception.
I hope you take it in django.

def filefield_maxlength_validator(value):
    """"
        Check if absolute file path can fit in database table 
        and filename length can fit in filesystem.
    """
     
    FILESYSTEM_FILENAME_MAXLENGTH = 255 # ext4, for example
    RESERVE = 50 # We need reserve for generating thumbnails and for suffix if filename already exists, example - filename_85x85_1.jpg

    # For the initial file saving value.path contains not the same path, which will be used to save file
    filename = value.name
    filepath = os.path.join(
        settings.MEDIA_ROOT, 
        value.field.generate_filename(value.instance, filename)
    )
    bytes_filename = len(filename.encode('utf-8')) # filename length in bytes
    bytes_filepath = len(filepath.encode('utf-8')) # path length in bytes

    # File path length should fit in table cell and filename lenth should fit in in filesystem
    if bytes_filepath > value.field.max_length or bytes_filename > FILESYSTEM_FILENAME_MAXLENGTH - RESERVE:
        if os.path.isfile(value.path):
            os.remove(value.path)
        raise exceptions.ValidationError(_(u"File name too long."))
    return value

FileField.default_validators = FileField.default_validators[:] + [filefield_maxlength_validator]

comment:15 Changed 3 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

comment:16 Changed 19 months ago by melinath

  • Owner changed from nobody to melinath

comment:17 Changed 19 months ago by melinath

  • Status changed from new to assigned

Overriding get_valid_name is probably not a workable solution. Currently, it claims only to return a filename "that's suitable for use in the target storage system." This just means a short cleaning of whitespace characters, hyphens, etc. So making it actually return a valid filename (in terms of length) would be a change in mission. Additionally, we actually need to validate the length of the entire storage path (which get_valid_name doesn't have access to). And finally, even if we got a valid name, get_available_name can increase the length of that name during the saving process, leaving us exactly where we started.

get_available_name needs to take a max_length argument, or it needs to be able to guarantee that it won't increase the length of the filename. Since the latter isn't possible as far as I can see, I expect to do the former. If get_available_name *can't* make a name within the given max length, it will raise a ValueError. FileField validation would then need to call get_available_name *after* generating a filename and instead of (or after) a simple check of the length of the filename. Any ValueError raised can be re-raised as a ValidationError.

Since get_available_name is part of the public API, this would mean either doing introspection, wrapping every call to get_available_name in try/except, or adding a new method called (say) get_valid_storage_path, which would handle max_length and the available_name checks, then deprecate get_available_name. Personally, I would lean toward the latter option, since I find the whole name/filename/path/storage path naming mess confusing.

Last edited 19 months ago by melinath (previous) (diff)

comment:18 Changed 19 months ago by akaariai

I kind of like the idea that get_valid_name() just converts the name to be something valid for the storage backend. Then, get_available_name has the final control of how to truncate the name, and to make sure the name is within the name length limit. The contract is altered from "gimme an available file name as close to the requested name" to "gimme an available file name as close to the requested name, max length N".

It seems the final name is determined by .save() - if the file name clashes due to race conditions, then .save() can alter the file name. This brings another point to the discussion: if final file name is determined by .save(), then shouldn't .save() also know the file name length restriction?

As for implementation, I like the try-except approach most.

comment:19 Changed 19 months ago by melinath

Yes, .save() (and ._save()) would also need to know about the max_length restriction.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as assigned
The owner will be changed from melinath to anonymous. Next status will be 'assigned'
The ticket will be disowned. Next status will be 'new'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.