Opened 9 years ago

Last modified 3 years ago

#8307 new Cleanup/optimization

ImageFile use of width_field and height_field is slow with remote storage backends

Reported by: sebastian.serrano@… Owned by: Jacob
Component: File uploads/storage Version: master
Severity: Normal Keywords: imagefile height width
Cc: Alan Justino da Silva, wiktor@…, walter+django@… Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


im attaching a patch that could make it work

Attachments (3) (801 bytes) - added by anonymous 9 years ago.
images.1.diff (995 bytes) - added by sebastian.serrano@… 9 years ago. (750 bytes) - added by Alan Justino da Silva 5 years ago.
Let the storage backend disable touching the image file to get height/width

Download all attachments as: .zip

Change History (17)

Changed 9 years ago by anonymous

Attachment: added

comment:1 Changed 9 years ago by anonymous

milestone: 1.0

comment:2 Changed 9 years ago by oyvind

Has patch: set
Keywords: imagefile height width added
Triage Stage: UnreviewedDesign decision needed

Seems like a bug to me

comment:3 Changed 9 years ago by Eric Holscher

Is this the same bug as #8208 ?

comment:4 Changed 9 years ago by anonymous

no, is not the same bug. this one is related with the use of the width_field and height_field

comment:5 Changed 9 years ago by Jacob

Owner: changed from nobody to Jacob
Status: newassigned
Triage Stage: Design decision neededAccepted

comment:6 Changed 9 years ago by Marty Alchin

Needs documentation: set

This is a tough one. On one hand, I can appreciate the performance gain of using the cache fields, since they're already available, and they *should* always be in sync with what's on the file. But I can't help thinking that I like the idea of ImageFile.width and ImageFile.height always returning the actual dimensions of the file, no matter what.

If they used the cache fields and those ever get out of sync with the file itself (which is possible any number of ways, depending on your application), the only way to get the true dimensions would be to use django.core.files.images.get_image_dimensions manually. And, after all, isn't the point of having properties for width and height to avoid having to call that manually? Plus, using get_image_dimensions directly means you'll have to cache those values manually if you expect to need them more than once.

I suppose I'm -0 on this ticket; someone with a decent argument might be able to sway me. Either way, though, I think the docs could be a bit more explicit about where the width and height values come from.

comment:7 Changed 9 years ago by sebastian.serrano@…

About your worries, the width and height values are updated every time the imagefield save method is fired, they could only be out of sync from a bug or a wrong use of the FileStorage mechanism (like setting the ._name field attr manually).
Also I have been using the patch I submit and has been working fine for me.
Without this patch the people that use remote filestorage backends (like S3, nirvanix or mogilefs), suffer a big performance hit for having to load the entire file every time a Model with an imagefield gets loaded. That means in this scenario a lot of more traffic, less performance and also a monetary cost.

English is not my primary language, sorry if something is not clear.
PD: im attaching another patch that could be more safe.

Changed 9 years ago by sebastian.serrano@…

Attachment: images.1.diff added

comment:8 Changed 9 years ago by Marty Alchin

For the record, here are a few of the ways I know of that the two could get out of sync:

  • A bug in Django (as you mention) which shouldn't happen, but is still possible
  • Improperly using Django's API (as you mention) to cause a mismatch
  • An admin or staff member could manually update a file directly, without using Django
  • A cron job could rename, resize or otherwise alter files
  • An admin or staff member could manually update the database directly, without using the model's save() method
  • A monkeypatch (mistakenly or not) altered or removed the code to update width_field and height_field
  • A custom ImageFile subclass (mistakenly or not) sets the width_field and height_field to the wrong values
  • ... and the list goes on

As for the performance (and potentially monetary) hit when accessing files, the width and height shouldn't be loaded when retrieving the file; they only get populated when actually accessing them. If you find they are in fact getting accessed every time regardless of whether they're being used, that's a legitimate bug that should be filed separately.

I have no doubt that the attached patch works correctly for the behavior you want. I'm just not sure it's the behavior Django should use.

comment:9 Changed 9 years ago by Jacob

Resolution: wontfix
Status: assignedclosed

Yeah, I like the access methods always getting the *real* file sizes.

Think of it this way: Model.image_width is a number the model knows about, but to get the real story, check Model.image.width -- all sorts of things can happen on the disk behind your back!

comment:10 Changed 6 years ago by Jacob

milestone: 1.0

Milestone 1.0 deleted

comment:11 Changed 5 years ago by Alan Justino da Silva

Cc: Alan Justino da Silva added
Easy pickings: unset
Resolution: wontfix
Severity: Normal
Status: closedreopened
Summary: ImageFile doesn't use the width_field and height_field for cacheImageFile use of width_field and height_field is slow with remote storage backends
Type: Cleanup/optimization
UI/UX: unset

What about letting the storage to choose if is worth to touch the file just to get its height/width?

This solves the problem for me. And it is a huge problem: My EC2 instance with S3 backend wastes about 4sec more each pageview just to know the width/height of 8 photos. And looks like the absent of this info harms not my app.

Is this an acceptable solution?

def _get_image_dimensions(self):
    if not hasattr(self, '_dimensions_cache'):
        if getattr(, 'IGNORE_IMAGE_DIMENSIONS', False):
            self._dimensions_cache = (0, 0) # Too costly to hit the file?
            close = self.closed
            self._dimensions_cache = get_image_dimensions(self, close=close)
    return self._dimensions_cache

Changed 5 years ago by Alan Justino da Silva

Attachment: added

Let the storage backend disable touching the image file to get height/width

comment:12 Changed 4 years ago by Aymeric Augustin

Status: reopenednew

comment:13 Changed 3 years ago by Wiktor

Cc: wiktor@… added

comment:14 Changed 3 years ago by Walter Doekes

Cc: walter+django@… added
Note: See TracTickets for help on using tickets.
Back to Top