Code

Opened 5 years ago

Last modified 7 months ago

#9586 new Bug

Shall upload_to return an urlencoded string or not?

Reported by: eibaan Owned by:
Component: File uploads/storage Version: 1.0
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Assume a custom upload_to function. If I return any unicode string (for example "Großköln / €"), that string is tried to use as a filename and as the URL. It might be an invalid file name and/or an invalid URL. If I use django.util.http.urlquote before returning the string, it becomes a valid URL but at the same time, the file is saved as "Gro%C3%9Fk%C3%B6ln%20/%20%E2%82%AC" but is tried to load as "Großköln / €" (because the browser already decodes it) from the file system. This does not work. I think, the URL (at least the one used in the admin UI) needs to be urlencoded once more. Or the default storage should automatically make sure that it can store files under any name and is not restricted to the restrictions of the file system.

I'm not sure what's the best solution but right now, it doesn't work for anything but an unspecified safe subset of characters. And because there's one function that is the base for both the file system name and the URL path segment, I cannot easiliy fix it.

Attachments (1)

admin.patch (943 bytes) - added by eibaan 5 years ago.

Download all attachments as: .zip

Change History (12)

Changed 5 years ago by eibaan

comment:1 Changed 5 years ago by eibaan

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

The attached patch makes sure that the admin UI always encodes the URL but still the file cannot be loaded for unknown reasons (and I think django.views.static.serve decodes it once to often and I haven't tried a "real" webserver yet). However, the file.url attribute should probably always encoded so I patched in the wrong place.

comment:2 Changed 5 years ago by eibaan

Analysis:

I checked django.core.files.storage and I think, my problem roots in that FileSystemStorage.url() does not create URLs. This is the point where the contract (or at least my expectation) is broken. urlparse.urljoin() is expecting URL segments but a filename is passed, that may contain spaces or other special characters that would need to be escaped.

Futhermore, on non-windows platforms where \ is a valid (even if uncommon) character, it is replaced with a slash so that a cleverly named file can provide access to subfolders you might not want to be accessible.

I think, a better implementation would be this:

url = "/".join(map(urlquote, name.split(os.path.sep)))
return urlparse.urljoin(self.base_url, url)

I find it irritating that Storage.save() replaces \ with /, perhaps as a poor man's attempt to make it an URL? Why not honor the operation system's conventions here? If at all, this belongs IMHO into the FileSystemStorage and not in the generic class.

The method Storage.get_valid_name() kills both \ and / even if other methods assume/accept them. It also restricts the set of valid characters to a-z, ignoring all other letters and removes spaces and other punctuations. Removing all but 26 letters might be ok for western countries (but even I need full unicode support) but I imagine it is inacceptable for for example Russian or Greek speaking countries.

This method hides the problems with that other methods because it restricts everything to a ASCII subset. If you create your own upload_to method, you circumvent that method call and feed "real" file names into that other methods which causes the described problems.

comment:3 Changed 5 years ago by jacob

  • Triage Stage changed from Unreviewed to Design decision needed

comment:4 Changed 5 years ago by pshc

Just chiming in that I'm using eibaan's snippet in a class derived from FileStorageSystem to make sure filefield.url is urlencoded properly and it's working beautifully. Would be great to see that in the real FileStorageSystem.url!

comment:5 Changed 3 years ago by lukeplant

  • Severity set to Normal
  • Type set to Bug

comment:6 Changed 2 years ago by anonymous

  • Easy pickings unset
  • UI/UX unset

comment:7 Changed 13 months ago by aaugustin

Yes, the url method should apply iri_to_uri. (I haven't verified that the issue is still valid.)

comment:8 Changed 13 months ago by aaugustin

  • Easy pickings set
  • Has patch set
  • Needs tests set
  • Patch needs improvement set
  • Triage Stage changed from Design decision needed to Accepted

comment:9 Changed 10 months ago by susan

  • Owner changed from nobody to susan
  • Status changed from new to assigned

comment:10 Changed 10 months ago by susan

  • Owner susan deleted
  • Status changed from assigned to new

comment:11 Changed 7 months ago by timo

  • Easy pickings unset

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from (none) to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.