Code

Opened 5 years ago

Closed 4 years ago

Last modified 3 years ago

#11158 closed (fixed)

get_image_dimensions very slow after 1 call

Reported by: kua Owned by: SAn
Component: Core (Other) Version: master
Severity: Keywords: get_image_dimensions, field, save, slow
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

get_image_dimensions in django.core.files.images is very slow after 1 call

This has the effect of slowing down calls to save on model image fields, the whole thing shudders to a halt while we spin around in PIL.Image.feed

The issue is that the read pointer is not being reset after the call to get_image_dimensions. Subsequent calls to file.read

To reproduce:

from django.core.files.images import get_image_dimensions
from django.core.files.uploadedfile import File

f = File(open('/tmp/some_multi_megabyte_file.jpg')
get_image_dimensions(f)
get_image_dimensions(f)

I suggest this function should save the read pointer pos, reset it to 0, then do all the PIL fancyness, then reset the pointer when finished. I've tried to attach a patch that reflects this

Imagine this scenario: Someone mucks with the read pointer on a hundred meg image, say reads 2 kilobytes. Then this function is called and starts reading data in 1K chunks looking for a header (which has already been skipped).

So 100 Megabytes - 2 kilobytes / 1024 bytes per read means this function will be called over 100,000 times

Attachments (6)

patch (1.7 KB) - added by kua 5 years ago.
patch.diff (938 bytes) - added by kua 5 years ago.
updated patch for this bug
svn_patch.diff (1.2 KB) - added by kua 5 years ago.
a patch for the trunk
11158.diff (1.3 KB) - added by SmileyChris 4 years ago.
11158_test.diff (1.1 KB) - added by SAn 4 years ago.
test case
11158-combined.diff (2.4 KB) - added by jkocherhans 4 years ago.
Merged SmilyChris's patch and SAn's tests into a single patch.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 5 years ago by kua

  • Component changed from Uncategorized to Core framework
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Changed 5 years ago by kua

comment:2 follow-up: Changed 5 years ago by kua

In [110]: f.seek(0)

In [111]: get_image_dimensions(f)
#very quick/snappy
Out[111]: (4000, 3000)

In [112]: f.seek(1)

In [113]: get_image_dimensions(f)
#very slow ~30 seconds, and returns none
In [114]:

comment:3 in reply to: ↑ 2 Changed 5 years ago by kua

Also, a work around for the 1.0x, current, code base: fd.seek(0) anytime before you save that model which has an ImageField

comment:4 Changed 5 years ago by SmileyChris

  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted

Patch is messy, but the ticket seems valid.

Changed 5 years ago by kua

updated patch for this bug

comment:5 Changed 5 years ago by kua

I added a new patch for this ticket.

I'm not sure what's going on in the svn trunk r10737, it appears that some underlying code in File etc has changed. But this issue depends purely on PIL so we would need a patch for the svn trunk as well..

Changed 5 years ago by kua

a patch for the trunk

comment:6 Changed 5 years ago by kua

  • Version changed from 1.0 to SVN

comment:7 Changed 5 years ago by kua

I changed the Version to SVN because I just tested and this bug is present in the trunk.

I've attached a patch against trunk using the same solution in the previous patch

comment:8 Changed 5 years ago by rvanlaar

  • milestone set to 1.1

I wrote a testcase for it, but I don't know how to generate a image with PIL.
Putting a 40 MB file in the svn wouldn't be a good a idea.

from time import time
from django.core.files.images import get_image_dimensions
from django.core.files.uploadedfile import File

f = File(open('/tmp/some_multi_megabyte_file.jpg')
t1 = time()
get_image_dimensions(f)
t2 = time()
run1_time = t2 - t1
t3 = time()
get_image_dimensions(f)
t4 = time()
run2_time = t4 - t3

# The second run time should be in the same order of the first run.
# So we check if the second runtime is not more then one order of 
# magnitude off.

assert(run2_time < 10 * run1_time)

A quick solution is using:

get_image_dimensions(f.name)

A better way to fix this is having an exposed classfunction
to call get_image_dimensions, since django.core.files.image.ImageFile
already has a self._dimensions_cache.

comment:9 Changed 5 years ago by rvanlaar

  • milestone 1.1 deleted

Should be fixed for 1.1.1

comment:10 Changed 5 years ago by rvanlaar

  • milestone set to 1.2

Can wait for 1.2.

Use the ImageFile.width and ImageFile.height.

from django.core.files.images import ImageFile
im = ImageFile(open('/tmp/some_multi_megabyte_file.jpg')
im.width
im.height

Changed 4 years ago by SmileyChris

comment:11 Changed 4 years ago by SmileyChris

Simpler patch, same effect.

comment:12 Changed 4 years ago by SAn

  • Needs tests set
  • Owner changed from nobody to SAn

Changed 4 years ago by SAn

test case

Changed 4 years ago by jkocherhans

Merged SmilyChris's patch and SAn's tests into a single patch.

comment:13 Changed 4 years ago by jkocherhans

  • Needs tests unset

comment:14 Changed 4 years ago by SAn

I think that the patch is ok and marker patch_need_improvement = 1 is outdated.

comment:15 Changed 4 years ago by russellm

  • Patch needs improvement unset

comment:16 Changed 4 years ago by russellm

  • Triage Stage changed from Accepted to Ready for checkin

comment:17 follow-up: Changed 4 years ago by russellm

  • milestone changed from 1.2 to 1.3

Ready for checkin, but not critical for 1.2.

comment:18 in reply to: ↑ 17 Changed 4 years ago by SAn

Replying to russellm:

Ready for checkin, but not critical for 1.2.

rusellm, the problem here is not only that it is slow to get the image dimensions, the problem is that it returns None after one call. I am sure it's not critical but if you expect (x, y) and you get None you will be bitten.

comment:19 Changed 4 years ago by russellm

@SAn: I'm not denying that it is a problem. I'm just denying that it is a problem that is a blocker for 1.2. This isn't a regression of behaviour in 1.2, and it doesn't cause catastrophic data loss; as such, it doesn't meet the criteria for a release critical bug.

comment:20 Changed 4 years ago by lukeplant

Fixed in [13715]. Don't know why it wasn't automatically closed.

comment:21 Changed 4 years ago by lukeplant

  • Resolution set to fixed
  • Status changed from new to closed

(In [13716]) [1.2.X] Fixed #11158 - get_image_dimensions very slow/incorrect after 1 call

Thanks to kua for the report, and to kua, SmileyChris and SAn for the patch

Backport of [13715] from trunk

comment:22 Changed 3 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.