#10300 closed (fixed)
Custom File Storage Backend broken by recent SVN commit.
Reported by: | ErikW | Owned by: | Armin Ronacher |
---|---|---|---|
Component: | File uploads/storage | Version: | dev |
Severity: | Keywords: | file upload storage s3 | |
Cc: | erik@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I just did an svn update to the latest trunk (from r9700), and now my file upload backend storage isn't behaving correctly. A 6MB file is being uploaded to my backend storage as a 1 byte file. When I revert by to r9700, the code behaves as intended.
My backend is a slightly modified (added encryption) version of the S3 backend from http://code.welldev.org/django-storages/wiki/Home
If any other info would be helpful in tracking this down, let me know and I'll post it.
Attachments (2)
Change History (12)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Ok, found the revision. Everything works great up to r9765. When I "svn up -r 9766" the problem begins.
comment:3 by , 16 years ago
Triage Stage: | Unreviewed → Accepted |
---|
I looked at this a little, though I cannot recreate since I do not have access to S3 (let alone the modified backend...the modifications may be relevant here). However, taking the S3 backend you point to and stubbing it out to not actually attempt to call S3 services but just log what it's called to do, I see a difference pre- and post- r9766 involving calls to the backend's size
method. Specifically, prior to r9766 the S3Storage size
method is not called when a file is uploaded (via admim). Running with r9766, however, it is called. Ultimately the call comes from the len(content) call in the FieldFile save method in django/db/models/fields/files.py:
def save(self, name, content, save=True): name = self.field.generate_filename(self.instance, name) self._name = self.storage.save(name, content) setattr(self.instance, self.field.name, self.name) # Update the filesize cache print 'Prior to updating filesize cache: content.__class__.__bases__ is: %s' % str(content.__class__.__bases__) self._size = len(content) # Save the object because it has changed, unless save is False if save: self.instance.save() save.alters_data = True
Though this method itself did not change, that print (added by me for debug) shows that the bases for content has changed. Prior to r9766 it prints:
Prior to updating filesize cache: content.__class__.__bases__ is: (<class 'django.core.files.uploadedfile.UploadedFile'>,)
Post-r9766 it prints:
Prior to updating filesize cache: content.__class__.__bases__ is: (<class 'django.core.files.uploadedfile.InMemoryUploadedFile'>, <class django.db.models.fields.files.FieldFile'>)
The effect of this difference is that instead of django/core/files/base.py File's _get_size:
def _get_size(self): if not hasattr(self, '_size'): if hasattr(self.file, 'size'): self._size = self.file.size elif os.path.exists(self.file.name): self._size = os.path.getsize(self.file.name) else: raise AttributeError("Unable to determine the file's size.") return self._size def _set_size(self, size): self._size = size size = property(_get_size, _set_size)
being called to determine len(content) (with _size having already been set previously at a point in the code I didn't track down), django/db/models/fields/files.py FieldFile's _get_size:
def _get_size(self): self._require_file() return self.storage.size(self.name) size = property(_get_size)
is called instead.
Now the _save
method in the S3 storage backend you point to does not rely on len(content) when it saves the data, but if your modified version does then it likely has a problem. If I put some prints into _save
:
print 'in _save: type(content) = %s, content.__class__.__bases__ is: %s' % (str(type(content)), str(content.__class__.__bases__)) print 'in _save: len(content): %d' % len(content)
Prior to r9766 I get:
in _save: type(content) = <class 'django.core.files.uploadedfile.InMemoryUploadedFile'>, content.__class__.__bases__ is: (<class 'django.core.files.uploadedfile.UploadedFile'>,) in _save: len(content): 818
Post r9766 I get:
in _save: type(content) = <class 'django.db.models.fields.files.InMemoryUploadedFile'>, content.__class__.__bases__ is: (<class 'django.core.files.uploadedfile.InMemoryUploadedFile'>, <class 'django.db.models.fields.files.FieldFile'>
and my stubbed backend's size
raises an exception because it is called to return the length of a file it hasn't saved yet.
So it seems there is a fairly non-obvious side-effect of r9766 involving len(content) for the content passed into the backend _save
:
- Prior to r9766 this would resolve to django/core/files/base.py File's _get_size method
- Post-r9766 the backend's own
size
method will get called, which isn't likely to work since_save
itself hasn't had a chance to save the file yet.
But I don't actually know for sure that this is the cause of the problem reported here, since I don't know that the modified backend relies on len(content) in its _save
method. It does seem to be a problem we need to fix, though, since it seems valid for a backend _save
to call len on the passed content, and it used to work...now it will find itself called to answer the len() question, which likely isn't going to work.
This is likely related to #10249...that one is reporting inability to determine a method resolution order while this one seems to be resulting from the fact that the method resolution order has changed.
Feedback from someone with a better understanding of r9766 would be helpful here.
follow-up: 6 comment:4 by , 16 years ago
I've attached a copy of the Modified storage backend (S3Storage.py) in case it will help in tracking this issue down.
comment:5 by , 16 years ago
milestone: | → 1.1 |
---|
by , 16 years ago
Attachment: | 10300.diff added |
---|
comment:6 by , 16 years ago
Replying to erikcw:
I've attached a copy of the Modified storage backend (S3Storage.py) in case it will help in tracking this issue down.
Yes, I think it helps. Your S3Storage _put_file function looks like this:
def _put_file(self, name, content): if self.encrypt == True: # Create a key object k = ezPyCrypto.key() # Read in a public key fd = open(settings.CRYPTO_KEYS_PUB, "rb") pubkey = fd.read() fd.close() # import this public key k.importKey(pubkey) # Now encrypt some text against this public key content = k.encString(content) # ...snip remainder...
looking at the ezPyCrypto code, that encString(content) call is going to result in it doing a len(content) to get the length of the data to encrypt. Given what I detailed above, that len call is going to result in the storage backend being called to report the length of something that hasn't been written to the backend yet.
I've attached a patch that may fix the issue. It changes the FieldFile's _get_size method so that the storage backend is called to supply the length only if _committed is true. If not, super is used to fallback to File's size property, which will be the size of the uploaded file. That may be all that's needed to fix this -- could you give it a try an let us know?
Review from someone more familiar with this code would be good. I'm still vaguely worried about he side-effects on method resolution order introduced by r9766, but that could be due to my relative unfamiliarity with the code here.
comment:7 by , 16 years ago
Thanks for the patch kmtracey! I applied it to my svn checkout and it seems to have fixed the problem. I'm going to keep an eye on it for a few days to make sure everything is behaving correctly.
Like you said, there could still be some other side-effects from r9766 -- hopefully the maintainer of this area of the branch will take a look and give the final sign-off for this fix.
I'll report if I notice any oddities...
comment:8 by , 16 years ago
Owner: | changed from | to
---|
Well, the next step is to work out which revision caused the problem. Since r9700 works and r9800+ doesn't work, use binary searching to find the version that breaks things. It will be at most seven checkouts and tests (and probably less, since not all those commits are to the same branch).