Opened 10 months ago
Closed 6 months ago
#35139 closed Bug (fixed)
Performing a save() with an ImageField where width_field or height_field is set results in an extra read operation
Reported by: | john-parton | Owned by: | john-parton |
---|---|---|---|
Component: | File uploads/storage | Version: | 5.0 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I have prepped a github repo here with the basic tests: https://github.com/john-parton/django-image-field-extra-read
Conditions for behavior
You must have an ImageField with the width_field
or height_field
arguments set.
Description of current behavior
When a model is saved, the image file is written out using the Storage API, and then
in order for the width and height fields to be updated, the file is read back out
and then the width and height are extracted from the image.
In the case the storage is local, the performance impact is probably negligible,
unless the application is seriously IO constrained, however if the storage is
remote, the performance impact is significant, and there can be other impacts on
operations.
For instance, if using S3 as backing, more GET operations are performed than
strictly necessary. This could be a few dollars a day of operational costs if your
scale is small, but can be significant if egress charges are high.
As another example, CloudFlare Images rate limits requests. This effectively cuts
the rate limit in half because every save operations requires an additional GET.
Proposed behavior
The proposed behavior is to simple read the image file which is resident in memory
without reading it back out from the storage backend.
Possible breaking issues
The vast majority of storage backends and use cases likely guarantee that if you
write a file into storage, and then retrieve it, you will get the same file back.
However, for some image-specific services, they will compress or crush larger images.
For users who specifically have this use case, they may end up with the width_field
and height_field
not representing the actual size of the image in the store, but
rather the size of the image at time of upload.
Explanation of current behavior
It looks like when a model's save() method is called, the field's pre_save() method
is called which results in the descriptor for the field having its get method
called and then immediately having its set method called with a similar value.
The effect is to coerce the value of the field to ImageFieldFile
which is a subclass
of ImageFile
. The ImageFieldFile
instance is assigned a property of file
which
is the wrapped original value.
The image field then saves and persists the data using the storage API, and then the
wrapped file isn't referred to later to get the width and height. When the width and
height are requested, the file is read back out of storage.
Proposed fix
No specific fix at this time.
Mitigating breaking issues
Considering how unusual this use case is, it may be sufficient to document the change
in behavior and provide a code snippet to wire up a signal handler to do the
additional read for those users who have the unusual storage backends and actually
care about the width/height being what is on disk. This would also allow users to
customize the behavior. For instance, maybe if the image is under a certain resolution,
the storage provider guarantees they don't mangle the image. A user could enshrine
that logic in the signal handler, so they could still get the performance uplift where
appropriate.
Summary
It seems pretty unlikely that this is the intended behavior, and seems to largely be a byproduct of heavy descriptor use and "magic" in the code. I've tried mitigating this in my application code, but it requires me to monkey patch some private methods, and I'm not even sure I got it right. Any attempt to "fix" this at a level higher than Django's internals will just result in an unmaintainable mess. Back when Django was new, I'm not sure I would make this argument, but now storage is usually backed by some API like S3, I think it makes sense to make avoiding an extra network request the "sane default."
Change History (6)
comment:1 by , 10 months ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
comment:2 by , 10 months ago
Wow, okay. I did a search and thought it was thorough. Didn't realize it was a Pre- 1.0 existing issue. I'll update it there.
comment:3 by , 6 months ago
Resolution: | duplicate |
---|---|
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
After looking into the other ticket, I can see that one was a duplicate of #35139 which has been fixed. This described a separate concern.
comment:4 by , 6 months ago
Has patch: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:5 by , 6 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
Looks like a duplicate of #8307. Can you add your analysis as a comment in the original ticket?