Code

Opened 5 years ago

Last modified 3 years ago

#12157 new Cleanup/optimization

FileSystemStorage does file I/O inefficiently, despite providing options to permit larger blocksizes

Reported by: alecmuffett Owned by: nobody
Component: File uploads/storage Version: 1.1
Severity: Normal Keywords: io, FileSystemStorage, buffering, performance
Cc: alec.muffett@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

FileSystemStorage contains the following:

    def _open(self, name, mode='rb'):
        return File(open(self.path(name), mode))

..which is used to open files which are stored as FileFields in Django models.

If the programmer decides to hack through the file by using (for instance) the django.core.files.base.File.chunks() method:

    def chunks(self, chunk_size=None):
        """
        Read the file and yield chucks of ``chunk_size`` bytes (defaults to
        ``UploadedFile.DEFAULT_CHUNK_SIZE``).
        """

        if not chunk_size:
            chunk_size = self.__class__.DEFAULT_CHUNK_SIZE

        if hasattr(self, 'seek'):
            self.seek(0)
        # Assume the pointer is at zero...
        counter = self.size

        while counter > 0:
            yield self.read(chunk_size)
            counter -= chunk_size

...the programmer would expect self.read() - which drops through to django.core.files.base.File.read() - to honour its arguments and for the I/O to occur in DEFAULT_CHUNK_SIZE blocks, currently 64k; however Dtrace shows otherwise:

29830/0xaf465d0:  open_nocancel("file.jpg\0", 0x0, 0x1B6)              = 5 0
29830/0xaf465d0:  fstat(0x5, 0xB007DB60, 0x1B6)          = 0 
29830/0xaf465d0:  fstat64(0x5, 0xB007E1E4, 0x1B6)                = 0 0
29830/0xaf465d0:  lseek(0x5, 0x0, 0x1)           = 0 0
29830/0xaf465d0:  lseek(0x5, 0x0, 0x0)           = 0 0
29830/0xaf465d0:  stat("file.jpg\0", 0xB007DF7C, 0x0)          = 0 0
29830/0xaf465d0:  write_nocancel(0x1, "65536 113762\n\0", 0xD)           = 13 0
29830/0xaf465d0:  mmap(0x0, 0x11000, 0x3, 0x1002, 0x3000000, 0x0)                = 0x7C5000 0
29830/0xaf465d0:  read_nocancel(0x5, "\377\330\377\340\0", 0x1000)               = 4096 0
29830/0xaf465d0:  read_nocancel(0x5, "\333\035eS[\026+\360\215Q\361'I\304c`\352\v4M\272C\201\273\261\377\0", 0x1000)             = 4096 0
...
...(many more 4kb reads elided)...
...
29830/0xaf465d0:  sendto(0x4, 0x7C5014, 0x10000)                 = 65536 0

...reading blocks in chunks of 4Kb (on OSX) and writing them in 64Kb blocks.

The reason this is occurring is because "open(self.path(name), mode)" is used to open the file, invoking the libc() stdio buffering which is much smaller than the 64kb requested by the programmer.

This can be kludged-around by hacking the open() statement:

    def _open(self, name, mode='rb'):
        return File(open(self.path(name), mode, 65536)) # use a larger buffer

...or by not using the stdio file()/open() calls, instead using os.open()

In the meantime this means that Django is not handling FileSystemStorage reads efficiently.

It is not easy to determine whether this general stdio-buffer issue impacts other parts of Django's performance.

Attachments (0)

Change History (6)

comment:1 Changed 4 years ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 4 years ago by adamnelson

#9632 has a patch which may or may not affect this issue.

comment:3 Changed 3 years ago by mattmcc

  • Severity set to Normal
  • Type set to Cleanup/optimization

comment:4 Changed 3 years ago by julien

  • Has patch unset

comment:5 Changed 2 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:6 Changed 2 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from nobody to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.