Opened 8 months ago

Last modified 27 hours ago

#35572 assigned Cleanup/optimization

Improve performance replacing os.listdir() with os.scandir()

Reported by: Paolo Melchiorre Owned by: Marcus Vinicius Araujo
Component: Core (Other) Version: dev
Severity: Normal Keywords: scandir listdir python os
Cc: Paolo Melchiorre, Marcus Vinicius Araujo Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Use os.scandir() instead of os.listdir() in the remaining occurrences in the code:
https://github.com/search?q=repo%3Adjango%2Fdjango+os.listdir&type=code

Based on the Python documentation

Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory.

Change History (10)

comment:1 by Sarah Boyce, 8 months ago

Triage Stage: UnreviewedAccepted

Similar to #29689 accepting, thank you
Note that additional benchmarks in django-asv are always welcome 👍

comment:2 by Amir Karimi, 8 months ago

Owner: set to Amir Karimi
Status: newassigned

comment:3 by Tim Graham, 8 months ago

Component: UncategorizedCore (Other)

The description makes it sound like this is a simple find and replace all, however, do all usages "also need file type or file attribute information"?

in reply to:  3 comment:4 by Amir Karimi, 8 months ago

Replying to Tim Graham:

The description makes it sound like this is a simple find and replace all, however, do all usages "also need file type or file attribute information"?

Good point! Except this case: https://github.com/django/django/blob/aa74c4083e047473ac385753e047e075e8f04890/scripts/manage_translations.py#L42
I didn't find any other cases where file attributes (is_dir, etc) are needed, and only their names or the number of list_dir output are needed. The only edge that "scandir" may still have is its less memory consumption when it comes to large folders (which I suspect is the case in any of these usages)

comment:5 by Marcus Vinicius Araujo, 9 days ago

Cc: Marcus Vinicius Araujo added
Owner: changed from Amir Karimi to Marcus Vinicius Araujo

comment:6 by Marcus Vinicius Araujo, 9 days ago

Has patch: set

in reply to:  2 ; comment:7 by Marcus Vinicius Araujo, 9 days ago

Replying to Amir Karimi:

Amir, my bad. I'm new to contributing to this project, and I didn't notice that this ticket had already been assigned to you.

Anyway, I submitted a patch.

in reply to:  7 ; comment:8 by Amir Karimi, 8 days ago

Replying to Marcus Vinicius Araujo:

Replying to Amir Karimi:

Amir, my bad. I'm new to contributing to this project, and I didn't notice that this ticket had already been assigned to you.

Anyway, I submitted a patch.

That's fine. I forgot I had such a ticket. Next time, you can first ask what's going on with the task and if you don't hear back, you can assign it to yourself.

in reply to:  8 comment:9 by Marcus Vinicius Araujo, 6 days ago

Replying to Amir Karimi:

Replying to Marcus Vinicius Araujo:

Replying to Amir Karimi:

Amir, my bad. I'm new to contributing to this project, and I didn't notice that this ticket had already been assigned to you.

Anyway, I submitted a patch.

That's fine. I forgot I had such a ticket. Next time, you can first ask what's going on with the task and if you don't hear back, you can assign it to yourself.

Will do! Thanks.

comment:10 by Natalia Bidart, 27 hours ago

After reviewing this ticket, the relevant documentation, and the PR, I remain unconvinced of the benefits of this change. Given Tim's question and answer (specifically, that only manage_translations.py requires file type or file attribute information) the proposed change seems to add an extra level of indentation to the code without a clear benefit.

Basic performance tests show only minimal improvements. In accordance with our current docs for requesting performance optimizations I think we need to mark this as wontfix until a clear and substantial performance gain is demonstrated. The risk of breaking systems that rely on os.listdir (for example, via monkeypatching) outweighs the potential unproven benefits.

Note: See TracTickets for help on using tickets.
Back to Top