Opened 8 years ago

Closed 5 months ago

Last modified 5 months ago

#9586 closed Bug (fixed)

Shall upload_to return an urlencoded string or not?

Reported by: eibaan Owned by: nobody
Component: File uploads/storage Version: 1.0
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Assume a custom upload_to function. If I return any unicode string (for example "Großköln / €"), that string is tried to use as a filename and as the URL. It might be an invalid file name and/or an invalid URL. If I use django.util.http.urlquote before returning the string, it becomes a valid URL but at the same time, the file is saved as "Gro%C3%9Fk%C3%B6ln%20/%20%E2%82%AC" but is tried to load as "Großköln / €" (because the browser already decodes it) from the file system. This does not work. I think, the URL (at least the one used in the admin UI) needs to be urlencoded once more. Or the default storage should automatically make sure that it can store files under any name and is not restricted to the restrictions of the file system.

I'm not sure what's the best solution but right now, it doesn't work for anything but an unspecified safe subset of characters. And because there's one function that is the base for both the file system name and the URL path segment, I cannot easiliy fix it.

Attachments (1)

admin.patch (943 bytes) - added by eibaan 8 years ago.

Download all attachments as: .zip

Change History (13)

Changed 8 years ago by eibaan

Attachment: admin.patch added

comment:1 Changed 8 years ago by eibaan

The attached patch makes sure that the admin UI always encodes the URL but still the file cannot be loaded for unknown reasons (and I think django.views.static.serve decodes it once to often and I haven't tried a "real" webserver yet). However, the file.url attribute should probably always encoded so I patched in the wrong place.

comment:2 Changed 8 years ago by eibaan

Analysis:

I checked django.core.files.storage and I think, my problem roots in that FileSystemStorage.url() does not create URLs. This is the point where the contract (or at least my expectation) is broken. urlparse.urljoin() is expecting URL segments but a filename is passed, that may contain spaces or other special characters that would need to be escaped.

Futhermore, on non-windows platforms where \ is a valid (even if uncommon) character, it is replaced with a slash so that a cleverly named file can provide access to subfolders you might not want to be accessible.

I think, a better implementation would be this:

url = "/".join(map(urlquote, name.split(os.path.sep)))
return urlparse.urljoin(self.base_url, url)

I find it irritating that Storage.save() replaces \ with /, perhaps as a poor man's attempt to make it an URL? Why not honor the operation system's conventions here? If at all, this belongs IMHO into the FileSystemStorage and not in the generic class.

The method Storage.get_valid_name() kills both \ and / even if other methods assume/accept them. It also restricts the set of valid characters to a-z, ignoring all other letters and removes spaces and other punctuations. Removing all but 26 letters might be ok for western countries (but even I need full unicode support) but I imagine it is inacceptable for for example Russian or Greek speaking countries.

This method hides the problems with that other methods because it restricts everything to a ASCII subset. If you create your own upload_to method, you circumvent that method call and feed "real" file names into that other methods which causes the described problems.

comment:3 Changed 8 years ago by Jacob

Triage Stage: UnreviewedDesign decision needed

comment:4 Changed 7 years ago by pshc

Just chiming in that I'm using eibaan's snippet in a class derived from FileStorageSystem to make sure filefield.url is urlencoded properly and it's working beautifully. Would be great to see that in the real FileStorageSystem.url!

comment:5 Changed 6 years ago by Luke Plant

Severity: Normal
Type: Bug

comment:6 Changed 5 years ago by anonymous

Easy pickings: unset
UI/UX: unset

comment:7 Changed 4 years ago by Aymeric Augustin

Yes, the url method should apply iri_to_uri. (I haven't verified that the issue is still valid.)

comment:8 Changed 4 years ago by Aymeric Augustin

Easy pickings: set
Has patch: set
Needs tests: set
Patch needs improvement: set
Triage Stage: Design decision neededAccepted

comment:11 Changed 3 years ago by Tim Graham

Easy pickings: unset

comment:12 Changed 3 years ago by anonymous

Hi I just stumbled over this problem using Django 1.6.4 on windows.

Is there any idea when this bug will be fixed,
or is there a "howto" somewhere?

comment:13 Changed 5 months ago by Sergei Maertens

Needs tests: unset
Patch needs improvement: unset

Looks like this was fixed by #25905 - I cannot reproduce the original issue. I haven't tested on Windows though.

models.py:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.db import models


def upload_to(instance, filename):
    return 'Großköln / €/{}'.format(filename)


class FileModel(models.Model):
    file = models.FileField(upload_to=upload_to)

admin.py

from django.contrib import admin

from .models import FileModel


admin.site.register(FileModel)

and the rest is the standard media/staticfiles settings/urls.

The url presented to me in the admin:

<a href="/media/Gro%C3%9Fk%C3%B6ln%20/%20%E2%82%AC/mail.txt">Großköln / €/mail.txt</a>

which is properly escaped and served by django.

comment:14 Changed 5 months ago by Sergei Maertens

Resolution: fixed
Status: newclosed

Fixed in #25905

Last edited 5 months ago by Sergei Maertens (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top