Code

Opened 7 years ago

Closed 6 years ago

Last modified 3 years ago

#5361 closed (fixed)

Support pluggable backends for FileField

Reported by: Marty Alchin <gulopine@…> Owned by: Gulopine
Component: Database layer (models, ORM) Version: master
Severity: Keywords: fs-rf-fixed
Cc: ross@…, faheem@…, larlet@…, cmawebsite@…, roppert@…, jvisinand@…, dcramer@…, danielnaab@…, griff.rees@…, django@…, jmunter@…, yatiohi@…, sverre.johansen@…, prufrocks@…, sciyoshi@…, jarek.zgoda@…, xphuture@…, sebastian.serrano@…, jesse@…, robvdl@…, david@…, richard@…, varikin@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

In order to provide the ability to store files in alternate, even remote, filesystems, FileField should support a backend protocol and allow individual FileField instances to specify a backend to store files.

Attachments (21)

file-backends.diff (23.7 KB) - added by Marty Alchin <gulopine@…> 7 years ago.
FileField refactor to support backends, as well as a FileSystemBackend to implement existing functionality
filestorage.diff (28.1 KB) - added by Marty Alchin <gulopine@…> 7 years ago.
Moved backend code into django.core.filestorage, added documentation and a few minor tweaks
filestorage.2.diff (37.3 KB) - added by Marty Alchin <gulopine@…> 7 years ago.
Fixed a tyop and added missing documentation in the last patch
filestorage.3.diff (19.6 KB) - added by x0nix 7 years ago.
Fixed few details, so it works for me with current trunk (hope didn't break it for you :-) ). Also default backend is loaded from settings.
filestorage.4.diff (33.2 KB) - added by Gulopine 7 years ago.
Working patch with documentation, including get_absolute_path()
filestorage.5.diff (33.2 KB) - added by Gulopine 7 years ago.
Fixed a reference that didn't get updated to get_absolute_path()
filestorage.6.diff (33.2 KB) - added by javinievas 7 years ago.
Fixed a bug when saving the uploaded file
filestorage.7.diff (52.0 KB) - added by Gulopine 7 years ago.
Much more complete patch, including docs and tests
filestorage.8.diff (63.3 KB) - added by Gulopine 7 years ago.
Fixed a few things, added tests and hopefully the patch will work this time
filestorage.9.diff (58.4 KB) - added by Gulopine 6 years ago.
Finally implemented File.eq properly
filestorage.10.diff (64.1 KB) - added by Gulopine 6 years ago.
Fixed merge problems, updated to r7091, passed all tests, and removed a few uses of the now-deprecated get_FOO_*() methods
filestorage.11.diff (62.5 KB) - added by Gulopine 6 years ago.
Fixed a few issues raised by David Larlet, added documentation for upload_to callables, updated to patch cleanly against r7133
filestorage.12.diff (61.5 KB) - added by Gulopine 6 years ago.
Fixed caching problem, added FileField subclassing docs, updated to r7156
filestorage.13.diff (61.9 KB) - added by Gulopine 6 years ago.
Fixed a problem when deleting models, updated to r7261
filestorage.14.diff (64.0 KB) - added by Gulopine 6 years ago.
Some reorganization, several method renames and doc rewordings, and an update to r7321
filestorage.15.diff (63.9 KB) - added by Gulopine 6 years ago.
Takes david's corrections into account and updates to r7397
filestorage.16.diff (69.9 KB) - added by Gulopine 6 years ago.
Some security enhancements, more tests, and an update to r7520
5361-r8012.diff (69.5 KB) - added by Gulopine 6 years ago.
Lots of changes in this one, check the details below
5361-r8156.diff (100.6 KB) - added by Gulopine 6 years ago.
(hopefully final) patch for review, with full docs and expanded tests
5361-r8189.diff (101.6 KB) - added by Gulopine 6 years ago.
Fixed a few test failures, removed the custom open(), updated docs and tests
5361-r8222.diff (100.6 KB) - added by Gulopine 6 years ago.
Updated to r8222, cleaned up File.open() and removed RemoteFile and StorageFile

Download all attachments as: .zip

Change History (109)

Changed 7 years ago by Marty Alchin <gulopine@…>

FileField refactor to support backends, as well as a FileSystemBackend to implement existing functionality

comment:1 Changed 7 years ago by Simon G. <dev@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

comment:2 Changed 7 years ago by Marty Alchin <gulopine@…>

  • Needs documentation set
  • Needs tests set

I don't have any documentation written for this patch yet, but the patch is here to look over. It's also untested in newforms-admin, but it doesn't change any of the form-handling code, so it should work properly. I'll try to have some documentation ready sometime this week. In the meantime, I'll make some additional notes about it on django-developers, to help explain things a bit.

Changed 7 years ago by Marty Alchin <gulopine@…>

Moved backend code into django.core.filestorage, added documentation and a few minor tweaks

Changed 7 years ago by Marty Alchin <gulopine@…>

Fixed a tyop and added missing documentation in the last patch

comment:3 Changed 7 years ago by Gulopine

  • Owner changed from nobody to Gulopine

comment:4 Changed 7 years ago by x0nix

There is wrong import in last patch.

In django/db/models/fields/init.py on line 750 should be
"from django.core.filestorage import FileSystemBackend"

comment:5 Changed 7 years ago by x0nix

Eh close ... it should be:

from django.core.filestorage.filesystem import FileSystemBackend

comment:6 Changed 7 years ago by x0nix

One more idea ... the default backend should be in settings.
Then you could (for example) use different backend for development and other on production server.

Changed 7 years ago by x0nix

Fixed few details, so it works for me with current trunk (hope didn't break it for you :-) ). Also default backend is loaded from settings.

comment:7 Changed 7 years ago by x0nix

Seems my last patch is wrong, ignore (or delete) it please ...

comment:8 Changed 7 years ago by Gulopine

While I'm not opposed to finding a way to specify the filestorage backend in settings.py, specifying just a path alone won't generally suffice. You see, alternate backends don't always share the same arguments, so it's difficult to expect a single function call to work for all available backends.

The only way I can really think of doing it is to instantiate the backend object directly in the settings. For FileStorageBackend, this is okay as long as the media_root and media_url arguments are supplied. If they're not, it'll try loading them from django.conf.settings, which will obviously break fairly severely.

Of course, you could manually supply the MEDIA_ROOT and MEDIA_URL values as arguments, but then that disallows using it in global_settings.py, since MEDIA_ROOT and MEDIA_URL aren't set properly when the default backend would get instantiated.

So, all in all, I think it's worth considering, but I'm not sure the framework in its current condition can support it. I'll see what it'll take to do so, but it may be a documentation issue. For instance, one way to do it would be to import the backend from some module that's different in your development and production environments. Then the application code could just use the backend protocol, while the environment-specific code determine which backend is most appropriate.

comment:9 Changed 7 years ago by PhiR

  • Owner changed from Gulopine to PhiR
  • Status changed from new to assigned

Will likely solve #641, #1642 and #5306.

comment:10 Changed 7 years ago by PhiR

  • Owner changed from PhiR to Gulopine
  • Status changed from assigned to new

comment:11 Changed 7 years ago by ubernostrum

Will solve #3588 when fixed.

comment:12 Changed 7 years ago by Gulopine

Actually, I don't see how this patch would have any impact whatsoever on #3588. This doesn't change what field is used to store the path+filename, it only affects how and where the file itself is stored.

Changed 7 years ago by Gulopine

Working patch with documentation, including get_absolute_path()

Changed 7 years ago by Gulopine

Fixed a reference that didn't get updated to get_absolute_path()

comment:13 Changed 7 years ago by jacob

  • Owner changed from Gulopine to jacob
  • Status changed from new to assigned

comment:14 Changed 7 years ago by Thomas Güttler <hv@…>

  • Cc hv@… added

Changed 7 years ago by javinievas

Fixed a bug when saving the uploaded file

comment:15 Changed 7 years ago by anonymous

  • Cc ross@… added

comment:16 Changed 7 years ago by faheem

  • Cc faheem@… added

comment:17 Changed 7 years ago by Gulopine

  • Keywords fs-rf added

comment:18 Changed 7 years ago by david

  • Cc larlet@… added

Changed 7 years ago by Gulopine

Much more complete patch, including docs and tests

comment:19 Changed 7 years ago by Gulopine

  • Keywords fs-rf removed
  • Needs documentation unset
  • Needs tests unset

This latest patch includes tests and updated documentation, and some changes that attempt to address many various issues that have come up regarding FileField. I've added "fs-rf-fixed" and "fs-rf-docs" keywords to tickets that are addressed, at least in part, by this new patch. The tickets with "fs-rf-fixed" are legitimately fixed by this patch, and be closed immediately when committed. The "fs-rf-docs" tickets aren't fixed directly by the patch, but their fixes are enabled by the new features, and the documentation covers what hooks will be needed to solve the problems described.

Also, the patch doesn't come up in the web viewer, but it should be intact. TortoiseSVN tends to screw up my patches sometimes for some reason, though, so if it doesn't work, let me know and I'll try again.

comment:20 Changed 7 years ago by anonymous

hej Gulopine
i cant seem to be able to apply filestorage.7.diff using

patch -p0 filestorage.7.diff

to me it looks like the .diff is broken
could you try again?

Changed 7 years ago by Gulopine

Fixed a few things, added tests and hopefully the patch will work this time

comment:21 Changed 6 years ago by CollinAnderson

  • Cc cmawebsite@… added

comment:22 Changed 6 years ago by roppert

  • Cc roppert@… added

Changed 6 years ago by Gulopine

Finally implemented File.eq properly

comment:23 Changed 6 years ago by Gulopine

It dawned on me that I had some temporary kludge in File.__eq__, that I had put in just to get the serializer tests to work. Now it's implemented properly, and the serializer tests still work!

comment:24 Changed 6 years ago by jvisinand <jvisinand@…>

  • Cc jvisinand@… added

comment:25 Changed 6 years ago by Gulopine

  • Owner changed from jacob to Gulopine
  • Status changed from assigned to new

The most recent diff had some problems I didn't catch before uploading, so it's not to be used at the moment. I'll get a new patch up this week.

comment:26 Changed 6 years ago by dcramer

  • Cc dcramer@… added

Changed 6 years ago by Gulopine

Fixed merge problems, updated to r7091, passed all tests, and removed a few uses of the now-deprecated get_FOO_*() methods

comment:27 Changed 6 years ago by david

My awesome contribution: s/desribed/described/ line 1167 of filestorage.10.diff

comment:28 Changed 6 years ago by david

In FileSystemBackend.delete_file() I suggest that we add a test:

 if filename and os.path.exists(file_name):

because otherwise it raises an IOError if filename is empty.

comment:29 Changed 6 years ago by david

An import of settings is missing in db.models.fields.files, used line 156, patch:

from django.conf import settings

comment:30 Changed 6 years ago by Dan Naab <danielnaab@…>

  • Cc danielnaab@… added

comment:31 Changed 6 years ago by George Kappel <george.kappel@…>

It appears that this patch is supposed to provide dynamic file naming capability but I have reviewed the docs and have not seen how that capability is exercised

Changed 6 years ago by Gulopine

Fixed a few issues raised by David Larlet, added documentation for upload_to callables, updated to patch cleanly against r7133

comment:32 Changed 6 years ago by Gulopine

George, this new patch includes documentation on providing a function as the upload_to argument, which provides the funcitonality you require.

Also, I mistyped in the attacment comment. That last attachment is valid up to r7141.

comment:33 Changed 6 years ago by Gulopine

David Larlet informed me that the new upload_to handling prevents model instances from being pickled (and thus cached) in many cases. This is now a known issue, I've got a fix for it, and I'll update the patch this weekend to resolve the issue.

Changed 6 years ago by Gulopine

Fixed caching problem, added FileField subclassing docs, updated to r7156

comment:34 Changed 6 years ago by Griffith Rees <griff.rees@…>

  • Cc griff.rees@… added

comment:35 Changed 6 years ago by jedie

  • Cc django@… added

Changed 6 years ago by Gulopine

Fixed a problem when deleting models, updated to r7261

comment:36 Changed 6 years ago by PhiR

Can't see the latest patches, trac issue again ?
By the way will this solve #6456 ? From filestorage.6.diff I'd say no but maybe this has been added since.

comment:37 Changed 6 years ago by Gulopine

No, I don't expect this will have any impact on #6456, which is why I didn't mark it with any fs-rf keywords. This patch doesn't really deal with the interactions between the database and file behaviors, it just manages how files are stored and accessed. #6456 deals more with the "when" which is also important, but beyond what this ticket is for.

comment:38 Changed 6 years ago by mtredinnick

  • Triage Stage changed from Design decision needed to Accepted

Changed 6 years ago by Gulopine

Some reorganization, several method renames and doc rewordings, and an update to r7321

comment:39 Changed 6 years ago by Gulopine

This most recent patch includes several changes suggested by Jacob, and should be a final patch except for some possible documentation changes. Anyone who's already been using this patch, be aware that most of the method names have changed, and your code will break. It won't break anybody who's running trunk, but those of you using the patch have been warned.

comment:40 Changed 6 years ago by david

Many changes, many questions:

  • why don't you use django.core.urlresolvers.get_callable in get_storage?
  • how can I pass arguments to my custom Storage? (let's say I want to pass Amazon credentials for S3, see #6390)
  • why get_absolute_url() had been renamed to url()? I just want to understand because the previous version looks more coherent with models for instance.

I confirm that my code will break :) With your answers, I will try to integrate this new patch in my project before it's merged to the trunk.

comment:41 Changed 6 years ago by Gulopine

Okay, I figured you'd be the first to react, and I'm glad you're following closely enough to ask these questions.

  • I didn't know about django.core.urlresolvers.get_callable() when I wrote it. Jacob thought there was some kind of a function to resolve import paths to objects, but we just didn't run across it. I'll look into it.
  • Just write your __init__() in the same way as the one in FileSystemStorage. That is, it can take arguments, just make sure they're optional, and their defaults should be pulled from settings.
  • Not to pass the buck, but those method renamings were requested by both Jacob and Adrian. It's a little inconsistent, but it's not something I'm willing to fight over.

I do apologize for the breakage, but that can happen even on our fairly-stable trunk, and is especially likely when running uncommitted patches.

comment:42 Changed 6 years ago by david

No problem for the breakage, I know that this game has rules and I'm happy to contribute. Thanks for your answers, I will pass all default arguments as settings now.

It seems that there is still a s/get_absolute_url/url/ to do in db.models.fields.files.File.url() (and maybe a test to add to verify this point?).

comment:43 Changed 6 years ago by david

Okay, I confirm that it works perfectly with both filesystem and S3 storages if you use the file attached in #6390.

comment:44 Changed 6 years ago by david

There is still a s/get_absolute_path/path/ to do in db.models.fields.files.File.path() line 38, let me know if you want a diff of your diff.

Changed 6 years ago by Gulopine

Takes david's corrections into account and updates to r7397

comment:45 Changed 6 years ago by Gulopine

Okay, I like david's suggestion of using the existing get_callable, but it feels wrong to reach into urlresolver code for this, and refactoring that would be outside the scope of this ticket. I've left it as is for now.

comment:46 Changed 6 years ago by david

What about my proposition to do not delete default image? For the moment I'd hardcoded it in storage.delete but that's not really reusable/safe. It'll be probably more appropriated to do it in FileField.delete_file because this way you have access to self.default

comment:47 Changed 6 years ago by Gulopine

Indeed, I missed that one, thanks for the reminder. I'll definitely put up a new patch with that included.

comment:48 Changed 6 years ago by SchruteBucks

Gulopine,
It looks like django.core.filestorage.init is missing an import of the exceptions module.

You need a:
from django.core import exceptions

I only hit this bug because my settings file didn't have a DEFAULT_FILE_STORAGE setting

Thanks

comment:49 Changed 6 years ago by Gulopine

Thanks, that's a good catch. I've added it to my local copy, and I'll get a new patch up soon. I'm just holding off to see if there's anything else that needs to be changed/added.

comment:50 Changed 6 years ago by dcramer

  • Cc jmunter@… added

comment:51 Changed 6 years ago by ctrochalakis

  • Cc yatiohi@… added

comment:52 Changed 6 years ago by anonymous

  • Cc sverre.johansen@… added

comment:53 Changed 6 years ago by anonymous

  • Cc prufrocks@… added
  • Has patch unset

comment:54 Changed 6 years ago by anonymous

  • Has patch set

comment:55 Changed 6 years ago by anonymous

  • Cc code.updates@… added

comment:56 Changed 6 years ago by Samuel Cormier-Iijima <sciyoshi@…>

  • Cc sciyoshi@… added

Changed 6 years ago by Gulopine

Some security enhancements, more tests, and an update to r7520

comment:57 Changed 6 years ago by david

Sometimes self._dimensions_cache doesn't exist in db.models.fields.files.ImageFile.delete(), I suggest to check with if hasattr(self, '_dimensions_cache'): before.

comment:58 Changed 6 years ago by George Kappel <george.kappel@…>

I am possibly missing something with "upload_to" but it seems FileField stores the full path to the file in the db

Wouldn't it be a lot more flexible to only store the part of the path that is in addition to the MEDIA_ROOT? This way if files moved or apps were moved to different computers the FileField would still work. Now to move things the db needs to be manually updated. A related nice to have would be cross platform separators in the path (Develop on Windows and Test/Deploy on Linux)

Also MEDIA_ROOT seems potentially awkward depending on the application maybe there should be a FILES_ROOT? Which may or may not be the same a MEDIA_ROOT?

comment:59 Changed 6 years ago by zgoda

  • Cc jarek.zgoda@… added

comment:60 Changed 6 years ago by cgrady

  • Cc cgrady-django5361@… added

comment:61 Changed 6 years ago by anonymous

  • Cc xphuture@… added

comment:62 Changed 6 years ago by sserrano <sebastian.serrano@…>

  • Cc sebastian.serrano@… added

comment:63 Changed 6 years ago by cgrady

  • Cc cgrady-django5361@… removed

comment:64 Changed 6 years ago by shadfc

Several hunks fail when applying the latest patch (filestorage.16.diff) against svn-7870. Also, trac didnt seem to like any of the latest 3 or 4 patches.

http://dpaste.com/61402/

comment:65 Changed 6 years ago by Gulopine

Sorry for not having posted here yet, but yes, I'm well aware that the latest patch most definitely fails miserably when trying to use it with Django after [7814] went in. I'm currently working on updating it to integrate those changes, and it will also likely introduce some slight API changes (fair warning to those of you working on S3 or other backends). I've been working very closely with the Mike Axiak on this, and it's very nearly there. Expect a new patch this week. I'll detail any external API changes for those of you who are already working with it.

As for Trac, it hasn't liked my patches (any many others') for quite some time. I don't know if it's the sheer size, or something about the fact that the patches add new files, or what exactly, but it's been happening for quite a while now. The patches are indeed there, and if you grab the "Original Format" it'll come through just fine, as you're no doubt aware, if you tried applying it.

comment:66 Changed 6 years ago by jesse@…

I'd like to propose adding a few methods that we use a lot at KI. We have a similar setup (generic filesystem that can be local or remote) and I've found that they are very helpful. I can add a patch once Gulopine updates his patches for the newest svn if people are interested. Here are some of the functions we use for this kind of thing (in production) that we've found helpful:

def modtime(self, key):
		return os.path.getmtime(os.path.join(self.root, key))

def delete_by_prefix(self, prefix):
	for root, dirs, files in os.walk(self.root, topdown=False):
	    for name in files:
			if name.startswith(prefix):
				os.unlink(os.path.join(root, name))

def list_dir(self, prefix=None):
	files = os.listdir(self.root)
	if prefix is not None:
		return [x for x in files if x.startswith(prefix)]
	return files

def copy(self, old_key, new_key):
		import shutil
		shutil.copy(
			os.path.join(self.root, old_key),
			os.path.join(self.root, new_key)
		)

I'll comment the s3 implementations in that ticket. delete_by_prefix is especially useful since we generate out thumbnails of images (our naming is model_id_field.ext so thumbnails might be model_id_field_399x233.ext). We generally use the local FS for testing and deployments use the S3 backend.

comment:67 Changed 6 years ago by Gulopine

It's always good to get feedback from people who are doing some more interesting things with files, it helps narrow down what we're trying to do.

I had considered something like modtime(), but I was waiting for anyone to actually ask for it before I bothered with anything. With the current patch, it's possible to override what File object gets used to represent your files, so it's easy to add a modtime() method to it. You can also override storage systems, so you can add it there too, if you need the exact behavior you described above. I'd like to leave it as that for now, and if enough people express displeasure with that, I'll consider it adding to the rest, but it just doesn't seem like it'd serve enough people. Then again, it's easy to add, so I'll consider it.

I'll definitely chalk up delete_by_prefix() as something you can achieve by subclassing your favorite storage system and adding a method. I can see you guys are making good use of it, and you'll be able to quite easily, but I don't think there's much of a case to be made for having it in core.

I can see value in list_dir() for some types of applications, but on the whole, I think it's outside what file storage is trying to do. If, however, somebody makes a case for making FilePathField aware of storage systems (nobody's said anything about it yet), then some sort of directory listing will be absolutely necessary.

I'm willing to entertain the idea of copy(), mainly because different backends will have different (more efficient) ways of handling that than just opening one file and writing another. S3, for instance, would require transferring the file over the wire, just to send it back with a different name, except that it has a separate copy command. I'm not sold on its usefulness for everybody yet, but there's definitely potential there. I'll think it over.

comment:68 Changed 6 years ago by jesse@…

  • Cc jesse@… added

Thanks for your comments, I think this is going to be a great ticket.

I agree with you on all points except when I'm "Using a storage system on its own." In this case I do not just want my inherited remote filesystem to have general filesystem functionality, I want all my filesystems to have those functions--mostly because I want to have the choice of filesystem to be determined by a settings file (for different servers, or testing vs production deployments). Now in the case where this ticket is a "file storage" ticket, we might not need things like "list_dir", but as a "file system" I would think we absolutely do--otherwise the abstraction layer falls short in cases outside of the file fields.

I could subclass the filesystem and create a more general filesystem before subclassing further into remote and local varieties, but that seems to be a little counter-productive. Just my 2 cents--this is great work and I'll be happy for it.

comment:69 Changed 6 years ago by robvdl

  • Cc robvdl@… added

comment:70 Changed 6 years ago by daviddahl

  • Cc david@… added

Changed 6 years ago by Gulopine

Lots of changes in this one, check the details below

comment:71 Changed 6 years ago by Gulopine

  • Needs documentation set
  • Patch needs improvement set

Regarding 5361-r8012.diff: this patch covers a lot of ground, so here are some details. The main goal of this version of the patch was to consolidate things a bit. After [7814] went in, it was clear that there'd be a lot of almost-duplication, and I wanted to centralize as much as I could, so all file operations can share as much code as possible. Most of the major changes are the product of that consolidation, and I hope it makes things easier all around.

  • Everything from django.core.filestorage has been moved into django.core.files, so there aren't two file-related packages lying around.
  • With a single package for more file stuff than just storage, I put all the storage-related items into django.core.files.storage. This means there's no great place to add extra backends in the future. The idea now is that Django itself will just have filesystem storage, and others will be provided by extra apps. This probably means #6390 will be closed as wontfix, but there's no official word on that yet.
  • All representations of files are now based on django.core.files.base.File, which provides most features. Subclasses can override __init__ to change how they're constructed, override other methods to customize behavior, and add new methods entirely, but they should all use that same base class. It's functional enough to be used on its own in some cases, but most will probably need a subclass. The existing uploaded file stuff has also been modified to use this base class.
  • Specific file types are now implemented as mixins, with ImageFile located at django.core.files.images. That module also contains get_image_dimensions that used to be at django.utils.images, with an import and a DeprecationWarning left at the old location. The mixin thing allows for more intelligent combinations of classes. Essentially, each backend can provide its own File class, while each FileField subclass can provide its own mixin, or they can be used in other cases too, even without models. It's all about flexibility. There's probably some more work to be done on that, though, so I'd especially like some tires to be kicked.
  • The backend protocol changed slightly, such that open() shouldn't be overridden by new backends. open() now takes a mixin argument, relating to the above point, and it calls _open() behind the scenes to get the backend-related file, which is then combined with the provided mixin to produce the resultant File object. Backends should only implement _open(), and that method works exactly like open() did up to this patch. I'm open to API improvements on this, but I think it's the best way to approach it, without requiring backends to deal with the mixin thing directly.
  • The object returned when you reference instance.field is now a legit File subclass on its own, without having to call .open() explicitly. There are almost certainly some hidden problems with this approach, and I haven't worked out a good way to test it thoroughly. I'm particularly concerned with caching file references and properly passing things through to remote backends, like S3 or !MogileFS. I'm almost certain there are some issues when referencing the file multiple times using instance.field.read(), for instance, because the object gets regenerated each time it's accessed. This can be worked around, but to keep caching working, it requires some trickiness that I'd like to avoid if I can. If there are significant issues though, the trickiness is definitely the lesser of two evils.
  • RemoteFile has been relegated to django.core.files.remote, and it's not really used internally, but it's there as a base for remote storage systems to use. It probably needs some updating to work properly with everything else, but I need somebody who's working on a remote backend to kick the tires on it a bit to know exactly what love it needs. You know who you are.

In general, I'm very happy with how the patch looks and works now, but I'm still a bit concerned with how well it addresses the needs of custom backends. I know a few people following this patch are working on some, and I'd greatly appreciate it if you can spare some time to update to the newest patch, and modify your backends accordingly. Please report back what you've found, what works, what doesn't, and what ideas you have to improve the interface.

Also, this patch doesn't include any documentation, because it hasn't yet been updated to the newest changes described above. I'll be working on that this week, but I wanted to get the code up so people can start working with it and get feedback. It's on the way, though. I may have missed a few changes, and if I have, I apologize, and I'm glad to explain anything that doesn't make sense.

comment:72 Changed 6 years ago by david

The S3 storage had been updated: http://code.larlet.fr/django-storages/rev/5851d7eadb31 (not far from a complete rewrite :-)) and tests copied from modeltests.files.models pass with the new version. There is no support for chuncked files yet.

There is just an additional self.size = len(data) in core.files.remote.RemoteFile.__init__() which could be added by default? Otherwise I need to write an S3RemoteFile class just for this line.

comment:73 Changed 6 years ago by Gulopine

Hrm, it looks like I also need to be more clear about what .save() has to accept, since it won't always be a raw string like your backend is expecting at the moment. I think I'll take the same route that I took with .open(), where .save() will handle the distinction between raw content and a File instance, and just pass a File instance through to ._save(), which the backends will have to implement. That should make it easier on backends, only having to support a single object type, rather than duplicating the dance that FileSystemBackend.save() does in the current patch.

Also, I should explicitly note that .path() is only valid for FileSystemStorage. It's a backwards-compatible replacement for ._get_FIELD_filename(), and is intended to be passed straight into Python's own open() function. Since that can't access things like S3, remote backends shouldn't define it. The current patch will simply issue an AttributeError if the backend doesn't define it, but in the next patch, I'll add a base .path() method that just raises NotImplementedError and override that in FileSystemStorage. That should make it more clear that other backends shouldn't do anything with it unless its result can actually be loaded using open().

comment:74 Changed 6 years ago by david

Re save(): I agree that a _save() function is more interesting given the complexity of the actual save(). For the moment, I've only copied tests which looks interesting to me but I'm open to a more complete storage specific test suite.

Re path(): to me url == path for a remote storage and that's what I've done (there was some troubles if path() is not defined for a storage with the previous patch but I can't remember why). I define path() function because I'd like to keep the order of FileSystemBackend but it is copied in url() a couple of lines below.

Changed 6 years ago by Gulopine

(hopefully final) patch for review, with full docs and expanded tests

comment:75 Changed 6 years ago by Gulopine

  • Keywords fs-rf-fixed added
  • Needs documentation unset
  • Patch needs improvement unset

I've written up full details of what the patch now contains, and the changes it makes on the wiki: FileStorageRefactor

Changed 6 years ago by Gulopine

Fixed a few test failures, removed the custom open(), updated docs and tests

comment:76 Changed 6 years ago by Richard <richard@…>

  • Cc richard@… added

Found this after adding bug #8095

comment:77 Changed 6 years ago by anonymous

  • Cc varikin@… added

Changed 6 years ago by Gulopine

Updated to r8222, cleaned up File.open() and removed RemoteFile and StorageFile

comment:78 Changed 6 years ago by Gulopine

  • Triage Stage changed from Accepted to Ready for checkin

Minor update, getting things ready for the merge. This removes both RemoteFile and StorageFile, because it's not entirely clear what helpers custom backends would find useful. Now, each custom backend (S3, MogileFS, etc) is responsible for subclassing django.core.files.File directly and returning that in Storage._open(). Once we have a few backends in use, it should be more clear what will actually be helpful in the long run.

comment:79 Changed 6 years ago by jacob

Marty, it looks like you generated this most recent patch over the signal refactoring, because it seems to contain all that signal refactoring, too. So it doesn't apply, unfortunatly. Other than that I think we're ready for commit here, so get me an updated patch and we'll roll.

comment:80 Changed 6 years ago by Gulopine

With all due respect, I haven't even downloaded the signal refactor, much less applied it to my filestorage tree, so I don't see how I could've gotten it in the patch. I just downloaded 5361-r8222.diff and searched for "signals" and "dispatch", and got just the public API references. Did you maybe apply both to the same checkout accidentally?

comment:81 Changed 6 years ago by jacob

Argh, yeah, it's a PEBKAC moment. Carry on.

comment:82 Changed 6 years ago by jacob

OK, I've merged the patch up to [8232] and started making some final tweaks. Follow along with my git repo if you like: http://code.djangoproject.com/git/?p=django;a=shortlog;h=refs/heads/5361-fsrf

comment:83 follow-up: Changed 6 years ago by Richard <richard@…>

Please include this snippet: http://www.djangosnippets.org/snippets/949/ either as ImageField replacement or CustomImageField in 1.0

comment:84 in reply to: ↑ 83 Changed 6 years ago by Gulopine

Replying to Richard <richard@cornbread.cc>:

Please include this snippet: http://www.djangosnippets.org/snippets/949/ either as ImageField replacement or CustomImageField in 1.0

There's no need to include that snippet, since this patch provides greater upload_to flexibility all on its own.

comment:85 Changed 6 years ago by anonymous

  • Cc code.updates@… removed

comment:86 Changed 6 years ago by jacob

  • Resolution set to fixed
  • Status changed from new to closed

(In [8244]) File storage refactoring, adding far more flexibility to Django's file handling. The new files.txt document has details of the new features.

This is a backwards-incompatible change; consult BackwardsIncompatibleChanges for details.

Fixes #3567, #3621, #4345, #5361, #5655, #7415.

Many thanks to Marty Alchin who did the vast majority of this work.

comment:87 Changed 6 years ago by guettli

  • Cc hv@… removed

comment:88 Changed 3 years ago by jacob

  • milestone 1.0 beta deleted

Milestone 1.0 beta deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.