Opened 8 months ago
Last modified 27 hours ago
#35572 assigned Cleanup/optimization
Improve performance replacing os.listdir() with os.scandir()
Reported by: | Paolo Melchiorre | Owned by: | Marcus Vinicius Araujo |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Normal | Keywords: | scandir listdir python os |
Cc: | Paolo Melchiorre, Marcus Vinicius Araujo | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Use os.scandir()
instead of os.listdir()
in the remaining occurrences in the code:
https://github.com/search?q=repo%3Adjango%2Fdjango+os.listdir&type=code
Based on the Python documentation
Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory.
Change History (10)
comment:1 by , 8 months ago
Triage Stage: | Unreviewed → Accepted |
---|
follow-up: 7 comment:2 by , 8 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
follow-up: 4 comment:3 by , 8 months ago
Component: | Uncategorized → Core (Other) |
---|
The description makes it sound like this is a simple find and replace all, however, do all usages "also need file type or file attribute information"?
comment:4 by , 8 months ago
Replying to Tim Graham:
The description makes it sound like this is a simple find and replace all, however, do all usages "also need file type or file attribute information"?
Good point! Except this case: https://github.com/django/django/blob/aa74c4083e047473ac385753e047e075e8f04890/scripts/manage_translations.py#L42
I didn't find any other cases where file attributes (is_dir, etc) are needed, and only their names or the number of list_dir output are needed. The only edge that "scandir" may still have is its less memory consumption when it comes to large folders (which I suspect is the case in any of these usages)
comment:5 by , 9 days ago
Cc: | added |
---|---|
Owner: | changed from | to
comment:6 by , 9 days ago
Has patch: | set |
---|
follow-up: 8 comment:7 by , 9 days ago
Replying to Amir Karimi:
Amir, my bad. I'm new to contributing to this project, and I didn't notice that this ticket had already been assigned to you.
Anyway, I submitted a patch.
follow-up: 9 comment:8 by , 8 days ago
Replying to Marcus Vinicius Araujo:
Replying to Amir Karimi:
Amir, my bad. I'm new to contributing to this project, and I didn't notice that this ticket had already been assigned to you.
Anyway, I submitted a patch.
That's fine. I forgot I had such a ticket. Next time, you can first ask what's going on with the task and if you don't hear back, you can assign it to yourself.
comment:9 by , 6 days ago
Replying to Amir Karimi:
Replying to Marcus Vinicius Araujo:
Replying to Amir Karimi:
Amir, my bad. I'm new to contributing to this project, and I didn't notice that this ticket had already been assigned to you.
Anyway, I submitted a patch.
That's fine. I forgot I had such a ticket. Next time, you can first ask what's going on with the task and if you don't hear back, you can assign it to yourself.
Will do! Thanks.
comment:10 by , 27 hours ago
After reviewing this ticket, the relevant documentation, and the PR, I remain unconvinced of the benefits of this change. Given Tim's question and answer (specifically, that only manage_translations.py
requires file type or file attribute information) the proposed change seems to add an extra level of indentation to the code without a clear benefit.
Basic performance tests show only minimal improvements. In accordance with our current docs for requesting performance optimizations I think we need to mark this as wontfix
until a clear and substantial performance gain is demonstrated. The risk of breaking systems that rely on os.listdir
(for example, via monkeypatching) outweighs the potential unproven benefits.
Similar to #29689 accepting, thank you
Note that additional benchmarks in django-asv are always welcome 👍