Opened 15 years ago

Closed 14 years ago

Last modified 12 years ago

#11158 closed (fixed)

get_image_dimensions very slow after 1 call

Reported by: kua Owned by: SAn
Component: Core (Other) Version: dev
Severity: Keywords: get_image_dimensions, field, save, slow
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

get_image_dimensions in django.core.files.images is very slow after 1 call

This has the effect of slowing down calls to save on model image fields, the whole thing shudders to a halt while we spin around in PIL.Image.feed

The issue is that the read pointer is not being reset after the call to get_image_dimensions. Subsequent calls to file.read

To reproduce:

from django.core.files.images import get_image_dimensions
from django.core.files.uploadedfile import File

f = File(open('/tmp/some_multi_megabyte_file.jpg')
get_image_dimensions(f)
get_image_dimensions(f)

I suggest this function should save the read pointer pos, reset it to 0, then do all the PIL fancyness, then reset the pointer when finished. I've tried to attach a patch that reflects this

Imagine this scenario: Someone mucks with the read pointer on a hundred meg image, say reads 2 kilobytes. Then this function is called and starts reading data in 1K chunks looking for a header (which has already been skipped).

So 100 Megabytes - 2 kilobytes / 1024 bytes per read means this function will be called over 100,000 times

Attachments (6)

patch (1.7 KB ) - added by kua 15 years ago.
patch.diff (938 bytes ) - added by kua 15 years ago.
updated patch for this bug
svn_patch.diff (1.2 KB ) - added by kua 15 years ago.
a patch for the trunk
11158.diff (1.3 KB ) - added by Chris Beaven 14 years ago.
11158_test.diff (1.1 KB ) - added by SAn 14 years ago.
test case
11158-combined.diff (2.4 KB ) - added by jkocherhans 14 years ago.
Merged SmilyChris's patch and SAn's tests into a single patch.

Download all attachments as: .zip

Change History (28)

comment:1 by kua, 15 years ago

Component: UncategorizedCore framework

by kua, 15 years ago

Attachment: patch added

comment:2 by kua, 15 years ago

In [110]: f.seek(0)

In [111]: get_image_dimensions(f)
#very quick/snappy
Out[111]: (4000, 3000)

In [112]: f.seek(1)

In [113]: get_image_dimensions(f)
#very slow ~30 seconds, and returns none
In [114]:

in reply to:  2 comment:3 by kua, 15 years ago

Also, a work around for the 1.0x, current, code base: fd.seek(0) anytime before you save that model which has an ImageField

comment:4 by Chris Beaven, 15 years ago

Patch needs improvement: set
Triage Stage: UnreviewedAccepted

Patch is messy, but the ticket seems valid.

by kua, 15 years ago

Attachment: patch.diff added

updated patch for this bug

comment:5 by kua, 15 years ago

I added a new patch for this ticket.

I'm not sure what's going on in the svn trunk r10737, it appears that some underlying code in File etc has changed. But this issue depends purely on PIL so we would need a patch for the svn trunk as well..

by kua, 15 years ago

Attachment: svn_patch.diff added

a patch for the trunk

comment:6 by kua, 15 years ago

Version: 1.0SVN

comment:7 by kua, 15 years ago

I changed the Version to SVN because I just tested and this bug is present in the trunk.

I've attached a patch against trunk using the same solution in the previous patch

comment:8 by Roland van Laar, 15 years ago

milestone: 1.1

I wrote a testcase for it, but I don't know how to generate a image with PIL.
Putting a 40 MB file in the svn wouldn't be a good a idea.

from time import time
from django.core.files.images import get_image_dimensions
from django.core.files.uploadedfile import File

f = File(open('/tmp/some_multi_megabyte_file.jpg')
t1 = time()
get_image_dimensions(f)
t2 = time()
run1_time = t2 - t1
t3 = time()
get_image_dimensions(f)
t4 = time()
run2_time = t4 - t3

# The second run time should be in the same order of the first run.
# So we check if the second runtime is not more then one order of 
# magnitude off.

assert(run2_time < 10 * run1_time)

A quick solution is using:

get_image_dimensions(f.name)

A better way to fix this is having an exposed classfunction
to call get_image_dimensions, since django.core.files.image.ImageFile
already has a self._dimensions_cache.

comment:9 by Roland van Laar, 15 years ago

milestone: 1.1

Should be fixed for 1.1.1

comment:10 by Roland van Laar, 15 years ago

milestone: 1.2

Can wait for 1.2.

Use the ImageFile.width and ImageFile.height.

from django.core.files.images import ImageFile
im = ImageFile(open('/tmp/some_multi_megabyte_file.jpg')
im.width
im.height

by Chris Beaven, 14 years ago

Attachment: 11158.diff added

comment:11 by Chris Beaven, 14 years ago

Simpler patch, same effect.

comment:12 by SAn, 14 years ago

Needs tests: set
Owner: changed from nobody to SAn

by SAn, 14 years ago

Attachment: 11158_test.diff added

test case

by jkocherhans, 14 years ago

Attachment: 11158-combined.diff added

Merged SmilyChris's patch and SAn's tests into a single patch.

comment:13 by jkocherhans, 14 years ago

Needs tests: unset

comment:14 by SAn, 14 years ago

I think that the patch is ok and marker patch_need_improvement = 1 is outdated.

comment:15 by Russell Keith-Magee, 14 years ago

Patch needs improvement: unset

comment:16 by Russell Keith-Magee, 14 years ago

Triage Stage: AcceptedReady for checkin

comment:17 by Russell Keith-Magee, 14 years ago

milestone: 1.21.3

Ready for checkin, but not critical for 1.2.

in reply to:  17 comment:18 by SAn, 14 years ago

Replying to russellm:

Ready for checkin, but not critical for 1.2.

rusellm, the problem here is not only that it is slow to get the image dimensions, the problem is that it returns None after one call. I am sure it's not critical but if you expect (x, y) and you get None you will be bitten.

comment:19 by Russell Keith-Magee, 14 years ago

@SAn: I'm not denying that it is a problem. I'm just denying that it is a problem that is a blocker for 1.2. This isn't a regression of behaviour in 1.2, and it doesn't cause catastrophic data loss; as such, it doesn't meet the criteria for a release critical bug.

comment:20 by Luke Plant, 14 years ago

Fixed in [13715]. Don't know why it wasn't automatically closed.

comment:21 by Luke Plant, 14 years ago

Resolution: fixed
Status: newclosed

(In [13716]) [1.2.X] Fixed #11158 - get_image_dimensions very slow/incorrect after 1 call

Thanks to kua for the report, and to kua, SmileyChris and SAn for the patch

Backport of [13715] from trunk

comment:22 by Jacob, 12 years ago

milestone: 1.3

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top