Opened 15 years ago
Last modified 14 years ago
#12157 new Cleanup/optimization
FileSystemStorage does file I/O inefficiently, despite providing options to permit larger blocksizes
Reported by: | alecmuffett | Owned by: | nobody |
---|---|---|---|
Component: | File uploads/storage | Version: | 1.1 |
Severity: | Normal | Keywords: | io, FileSystemStorage, buffering, performance |
Cc: | alec.muffett@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Pull Requests: | How to create a pull request | ||
Description ¶
FileSystemStorage contains the following:
def _open(self, name, mode='rb'): return File(open(self.path(name), mode))
..which is used to open files which are stored as FileFields in Django models.
If the programmer decides to hack through the file by using (for instance) the django.core.files.base.File.chunks() method:
def chunks(self, chunk_size=None): """ Read the file and yield chucks of ``chunk_size`` bytes (defaults to ``UploadedFile.DEFAULT_CHUNK_SIZE``). """ if not chunk_size: chunk_size = self.__class__.DEFAULT_CHUNK_SIZE if hasattr(self, 'seek'): self.seek(0) # Assume the pointer is at zero... counter = self.size while counter > 0: yield self.read(chunk_size) counter -= chunk_size
...the programmer would expect self.read() - which drops through to django.core.files.base.File.read() - to honour its arguments and for the I/O to occur in DEFAULT_CHUNK_SIZE blocks, currently 64k; however Dtrace shows otherwise:
29830/0xaf465d0: open_nocancel("file.jpg\0", 0x0, 0x1B6) = 5 0 29830/0xaf465d0: fstat(0x5, 0xB007DB60, 0x1B6) = 0 29830/0xaf465d0: fstat64(0x5, 0xB007E1E4, 0x1B6) = 0 0 29830/0xaf465d0: lseek(0x5, 0x0, 0x1) = 0 0 29830/0xaf465d0: lseek(0x5, 0x0, 0x0) = 0 0 29830/0xaf465d0: stat("file.jpg\0", 0xB007DF7C, 0x0) = 0 0 29830/0xaf465d0: write_nocancel(0x1, "65536 113762\n\0", 0xD) = 13 0 29830/0xaf465d0: mmap(0x0, 0x11000, 0x3, 0x1002, 0x3000000, 0x0) = 0x7C5000 0 29830/0xaf465d0: read_nocancel(0x5, "\377\330\377\340\0", 0x1000) = 4096 0 29830/0xaf465d0: read_nocancel(0x5, "\333\035eS[\026+\360\215Q\361'I\304c`\352\v4M\272C\201\273\261\377\0", 0x1000) = 4096 0 ... ...(many more 4kb reads elided)... ... 29830/0xaf465d0: sendto(0x4, 0x7C5014, 0x10000) = 65536 0
...reading blocks in chunks of 4Kb (on OSX) and writing them in 64Kb blocks.
The reason this is occurring is because "open(self.path(name), mode)" is used to open the file, invoking the libc() stdio buffering which is much smaller than the 64kb requested by the programmer.
This can be kludged-around by hacking the open() statement:
def _open(self, name, mode='rb'): return File(open(self.path(name), mode, 65536)) # use a larger buffer
...or by not using the stdio file()/open() calls, instead using os.open()
In the meantime this means that Django is not handling FileSystemStorage reads efficiently.
It is not easy to determine whether this general stdio-buffer issue impacts other parts of Django's performance.
According to the ticket's flags, the next step(s) to move this issue forward are:
- To provide a patch by sending a pull request. Claim the ticket when you start working so that someone else doesn't duplicate effort. Before sending a pull request, review your work against the patch review checklist. Check the "Has patch" flag on the ticket after sending a pull request and include a link to the pull request in the ticket comment when making that update. The usual format is:
[https://github.com/django/django/pull/#### PR]
.
Change History (6)
comment:1 by , 15 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 15 years ago
comment:3 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Cleanup/optimization |
comment:4 by , 14 years ago
Has patch: | unset |
---|
#9632 has a patch which may or may not affect this issue.