Code

Opened 6 years ago

Last modified 5 months ago

#8307 new Cleanup/optimization

ImageFile use of width_field and height_field is slow with remote storage backends

Reported by: sebastian.serrano@… Owned by: jacob
Component: File uploads/storage Version: master
Severity: Normal Keywords: imagefile height width
Cc: alanjds, wiktor@…, walter+django@… Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

im attaching a patch that could make it work

Attachments (3)

images.py.diff (801 bytes) - added by anonymous 6 years ago.
images.1.diff (995 bytes) - added by sebastian.serrano@… 6 years ago.
images.py.2.diff (750 bytes) - added by alanjds 2 years ago.
Let the storage backend disable touching the image file to get height/width

Download all attachments as: .zip

Change History (17)

Changed 6 years ago by anonymous

comment:1 Changed 6 years ago by anonymous

  • milestone set to 1.0
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 6 years ago by oyvind

  • Has patch set
  • Keywords imagefile height width added
  • Triage Stage changed from Unreviewed to Design decision needed

Seems like a bug to me

comment:3 Changed 6 years ago by ericholscher

Is this the same bug as #8208 ?

comment:4 Changed 6 years ago by anonymous

no, is not the same bug. this one is related with the use of the width_field and height_field

comment:5 Changed 6 years ago by jacob

  • Owner changed from nobody to jacob
  • Status changed from new to assigned
  • Triage Stage changed from Design decision needed to Accepted

comment:6 Changed 6 years ago by Gulopine

  • Needs documentation set

This is a tough one. On one hand, I can appreciate the performance gain of using the cache fields, since they're already available, and they *should* always be in sync with what's on the file. But I can't help thinking that I like the idea of ImageFile.width and ImageFile.height always returning the actual dimensions of the file, no matter what.

If they used the cache fields and those ever get out of sync with the file itself (which is possible any number of ways, depending on your application), the only way to get the true dimensions would be to use django.core.files.images.get_image_dimensions manually. And, after all, isn't the point of having properties for width and height to avoid having to call that manually? Plus, using get_image_dimensions directly means you'll have to cache those values manually if you expect to need them more than once.

I suppose I'm -0 on this ticket; someone with a decent argument might be able to sway me. Either way, though, I think the docs could be a bit more explicit about where the width and height values come from.

comment:7 Changed 6 years ago by sebastian.serrano@…

About your worries, the width and height values are updated every time the imagefield save method is fired, they could only be out of sync from a bug or a wrong use of the FileStorage mechanism (like setting the ._name field attr manually).
Also I have been using the patch I submit and has been working fine for me.
Without this patch the people that use remote filestorage backends (like S3, nirvanix or mogilefs), suffer a big performance hit for having to load the entire file every time a Model with an imagefield gets loaded. That means in this scenario a lot of more traffic, less performance and also a monetary cost.

English is not my primary language, sorry if something is not clear.
PD: im attaching another patch that could be more safe.

Changed 6 years ago by sebastian.serrano@…

comment:8 Changed 6 years ago by Gulopine

For the record, here are a few of the ways I know of that the two could get out of sync:

  • A bug in Django (as you mention) which shouldn't happen, but is still possible
  • Improperly using Django's API (as you mention) to cause a mismatch
  • An admin or staff member could manually update a file directly, without using Django
  • A cron job could rename, resize or otherwise alter files
  • An admin or staff member could manually update the database directly, without using the model's save() method
  • A monkeypatch (mistakenly or not) altered or removed the code to update width_field and height_field
  • A custom ImageFile subclass (mistakenly or not) sets the width_field and height_field to the wrong values
  • ... and the list goes on

As for the performance (and potentially monetary) hit when accessing files, the width and height shouldn't be loaded when retrieving the file; they only get populated when actually accessing them. If you find they are in fact getting accessed every time regardless of whether they're being used, that's a legitimate bug that should be filed separately.

I have no doubt that the attached patch works correctly for the behavior you want. I'm just not sure it's the behavior Django should use.

comment:9 Changed 6 years ago by jacob

  • Resolution set to wontfix
  • Status changed from assigned to closed

Yeah, I like the access methods always getting the *real* file sizes.

Think of it this way: Model.image_width is a number the model knows about, but to get the real story, check Model.image.width -- all sorts of things can happen on the disk behind your back!

comment:10 Changed 3 years ago by jacob

  • milestone 1.0 deleted

Milestone 1.0 deleted

comment:11 Changed 2 years ago by alanjds

  • Cc alanjds added
  • Easy pickings unset
  • Resolution wontfix deleted
  • Severity set to Normal
  • Status changed from closed to reopened
  • Summary changed from ImageFile doesn't use the width_field and height_field for cache to ImageFile use of width_field and height_field is slow with remote storage backends
  • Type set to Cleanup/optimization
  • UI/UX unset

What about letting the storage to choose if is worth to touch the file just to get its height/width?

This solves the problem for me. And it is a huge problem: My EC2 instance with S3 backend wastes about 4sec more each pageview just to know the width/height of 8 photos. And looks like the absent of this info harms not my app.

Is this an acceptable solution?

def _get_image_dimensions(self):
    if not hasattr(self, '_dimensions_cache'):
        if getattr(self.storage, 'IGNORE_IMAGE_DIMENSIONS', False):
            self._dimensions_cache = (0, 0) # Too costly to hit the file?
        else:
            close = self.closed
            self.open()
            self._dimensions_cache = get_image_dimensions(self, close=close)
    return self._dimensions_cache

Changed 2 years ago by alanjds

Let the storage backend disable touching the image file to get height/width

comment:12 Changed 16 months ago by aaugustin

  • Status changed from reopened to new

comment:13 Changed 7 months ago by viciu

  • Cc wiktor@… added

comment:14 Changed 5 months ago by wdoekes

  • Cc walter+django@… added

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from jacob to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.