Opened 3 months ago
Closed 5 weeks ago
#36439 closed Cleanup/optimization (fixed)
Auth hashing blocks event loop if using asyncio
Reported by: | Robert Aistleitner | Owned by: | Roelzkie |
---|---|---|---|
Component: | contrib.auth | Version: | 5.2 |
Severity: | Normal | Keywords: | async, auth, asyncio, performance |
Cc: | Robert Aistleitner, Roelzkie | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
To a large extent, issues with auth in async auth has been fixed in https://github.com/django/django/commit/50f89ae850f6b4e35819fe725a08c7e579bfd099, I guess this is the last part of proper async implementation of authentication.
There exist custom implementations for asyncio regarding checking a password using the configured hashers which can be found in https://github.com/django/django/blob/68c9f7e0b79168007e6ba0139fd315d7c44ca8c9/django/contrib/auth/hashers.py#L86-L91.
This implementation has a flaw because the CPU heavy calculation of the hash is blocking the event loop of asyncio, causing the whole server to stall and queueing up all the following authentications that may rush in in case of heavy load.
My proposal is to use a ThreadPoolExecutor here to be able to unload the work from the event loop:
CHECK_PASSWORD_THREAD_POOL_EXECUTOR = ThreadPoolExecutor(16) async def acheck_password(password, encoded, setter=None, preferred="default"): """See check_password().""" # verify_password is cpu heavy and needs to be executed in a separate thread to not block a running asyncio event loop is_correct, must_update = await sync_to_async( verify_password, thread_sensitive=False, executor=CHECK_PASSWORD_THREAD_POOL_EXECUTOR )(password, encoded, preferred=preferred) if setter and is_correct and must_update: await setter(password) return is_correct
The number of available thread could be exposed via setting, or skipped altogether by using a new thread on each verify_password call. What are your thoughts on this?
Change History (18)
comment:1 by , 3 months ago
Version: | 5.1 → 5.2 |
---|
comment:2 by , 3 months ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 3 months ago
Should I provide a patch for this improvement? Not really sure if there's anything I need to provide besides the patch itself, because tests basically already test the function extensively.
follow-up: 5 comment:4 by , 3 months ago
A patch (PR), particularly with benchmarks, sounds great. The sync_to_async
call is probably doing most of the work - it's possible there's no need for a dedicated threadpool just for this hashing. But again, a patch and some benchmark figures will help make that decision.
comment:5 by , 2 months ago
Replying to Jake Howard:
A patch (PR), particularly with benchmarks, sounds great. The
sync_to_async
call is probably doing most of the work - it's possible there's no need for a dedicated threadpool just for this hashing. But again, a patch and some benchmark figures will help make that decision.
Hello, I see the status of this ticket is Accepted without an owner. I wonder if you can allow me to spend time doing a basic profiling for the above-suggested solution, and make a comparison to help with your decision. Thank you.
EDIT: I made some basic profiling using cProfile
for 1000 users using an SQLite database.
- Current: 94770 function calls (94752 primitive calls) in 251.144 seconds. See all logs.
- With sync_to_async: 407770 function calls (405752 primitive calls) in 251.129 seconds. See all logs.
- With ThreadPoolExecutor(4): 390842 function calls (389818 primitive calls) in 254.972 seconds for 1000 users. See all logs.
- With ThreadPoolExecutor(8): 390842 function calls (389818 primitive calls) in 251.255 seconds for 1000 users. See all logs.
- With ThreadPoolExecutor(12): 390842 function calls (389818 primitive calls) in 254.198 seconds for 1000 users. See all logs.
- ThreadPoolExecutor(16): 390842 function calls (389818 primitive calls) in 254.155 seconds for 1000 users. See all logs.
They're almost has no difference, but I'm not 100% sure if I'm doing it correctly. Below is the python script for profiling:
import os import django os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings") django.setup() # -- import asyncio import cProfile from django.contrib.auth.models import User async def main(): profiler = cProfile.Profile() profiler.enable() async for user in User.objects.all()[:1000]: await user.acheck_password(user.username), user.username profiler.disable() profiler.print_stats(sort='cumulative') if __name__ == "__main__": asyncio.run(main())
System information:
$ sysctl -a | grep machdep.cpu machdep.cpu.cores_per_package: 8 machdep.cpu.core_count: 8 machdep.cpu.logical_per_package: 8 machdep.cpu.thread_count: 8 machdep.cpu.brand_string: Apple M1 $ python --version Python 3.13.2 $ python -c "import django; print(django.get_version());" 5.2.3
follow-up: 7 comment:6 by , 2 months ago
There appears to be a few things wrong with your current benchmark.
First you don't provide which hasher you used and and your setup steps most importantly whether or not you first set a password for the users you are testing against. If you didn't (e.g. you only set a username with an invalid password) then it's highly likely all underlying verify_password
calls never get to the point of hashing user.username
and then performing a constant time compare which are the expensive operations CPU wise that would benefit from executing in a thread pool.
Secondly the way you're iterating over each awaitable serially in a loop prevents any concurrent scheduling execution from taking place thus you most likely wouldn't notice if the event loop was blocked as you're only processing one task at a time. You'd want to buffer up futures and await them all so they can step on each others toes a bit if you hope to catch any interference between them.
comment:7 by , 2 months ago
Replying to Simon Charette:
There appears to be a few things wrong with your current benchmark.
First you don't provide which hasher you used and and your setup steps most importantly whether or not you first set a password for the users you are testing against. If you didn't (e.g. you only set a username with an invalid password) then it's highly likely all underlying
verify_password
calls never get to the point of hashinguser.username
and then performing a constant time compare which are the expensive operations CPU wise that would benefit from executing in a thread pool.
Secondly the way you're iterating over each awaitable serially in a loop prevents any concurrent scheduling execution from taking place thus you most likely wouldn't notice if the event loop was blocked as you're only processing one task at a time. You'd want to buffer up futures and await them all so they can step on each others toes a bit if you hope to catch any interference between them.
Hello, my bad, you are right. I'm executing the code serially with awaitables. I put them all into a task group, and the gap in performance is huge with a ThreadPoolExecutor
.
This is for 100 users only having passwords as their usernames, and I'm using the default PBKDF2PasswordHasher
hasher. I will also check other hashers later on:
- Current: 14138 function calls (14120 primitive calls) in 24.844 seconds
- With sync_to_async: 46823 function calls (45895 primitive calls) in 24.615 seconds
- With ThreadPoolExecutor(8): 45522 function calls (43874 primitive calls) in 5.565 seconds
import os from datetime import datetime import django os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings") django.setup() # -- import asyncio import cProfile from django.contrib.auth.models import User async def main(): profiler = cProfile.Profile() profiler.enable() async with asyncio.TaskGroup() as tg: # These users were created in a different script, with their username as their password. async for user in User.objects.all()[:100]: tg.create_task(user.acheck_password(user.username)) profiler.disable() profiler.print_stats(sort='cumulative') if __name__ == "__main__": asyncio.run(main())
comment:8 by , 2 months ago
More benchmark tests for different hashers with 100 users:
PBKDF2PasswordHasher 100 users
- Current: 25.555 seconds
- With sync_to_async: 25.340 seconds
- With ThreadPoolExecutor(8): 7.141 seconds
PBKDF2SHA1PasswordHasher 100 users
- Current: 26.066 seconds
- With sync_to_async: 25.890 seconds
- With ThreadPoolExecutor(8): 7.436 seconds
Argon2PasswordHasher 100 users
- Current: 3.568 seconds
- With sync_to_async: 3.469 seconds
- With ThreadPoolExecutor(8): 2.683 seconds
BCryptSHA256PasswordHasher 100 users
- Current: 23.532 seconds
- With sync_to_async: 23.347 seconds
- With ThreadPoolExecutor(8): 6.435 seconds
ScryptPasswordHasher 100 users
- Current: 16.721 seconds
- With sync_to_async: 16.282 seconds
- With ThreadPoolExecutor(8): 4.782 seconds
Certainly, having a ThreadPoolExecutor
has a wide performance gap compared to others without it, as theorized by Robert. I'm happy to provide a patch for this if you'll allow me. Not sure if we need to create a test for this but let me know.
comment:9 by , 2 months ago
Cc: | added |
---|---|
Owner: | set to |
Status: | new → assigned |
comment:10 by , 2 months ago
Created a PR. Not 100% sure about the PR, but if you have any thoughts to improve it, I'll be happy to modify the PR. Thank you.
comment:11 by , 2 months ago
Has patch: | set |
---|
comment:12 by , 7 weeks ago
Patch needs improvement: | set |
---|
comment:13 by , 7 weeks ago
I made a new benchmarking with the latest main
. The result is very different from my previous testing, and I'm not so sure if it's due to the rebase from recent main
or due to some changes in my local setup.
These tests were conducted on 100 users with a SQLite database, and the results of using sync_to_async
with or without ThreadPoolExecutor
showed no significant difference. However, it's 5x faster than the current version. For Argon2, the performance shows almost no difference, at least for the SQLite database.
PBKDF2PasswordHasher | PBKDF2SHA1PasswordHasher | Argon2PasswordHasher | BCryptSHA256PasswordHasher | ScryptPasswordHasher | |
---|---|---|---|---|---|
Current | 14.103s | 15.578s | 0.005s | 23.487s | 16.11s |
With sync_to_async | 3.124s | 2.957s | 0.004s | 3.867s | 3.335s |
With sync_to_async and ThreadPoolExecutor | 2.775s | 2.855s | 0.004s | 3.804s | 3.661s |
comment:14 by , 7 weeks ago
Patch needs improvement: | unset |
---|
comment:15 by , 6 weeks ago
Needs tests: | set |
---|
Marking as "needs tests" but I mean that it needs a benchmark script which is then verified by others
comment:16 by , 6 weeks ago
Needs tests: | unset |
---|
Benchmark is added here: https://github.com/django/django-asv/pull/94. Instructions are included. Hope it's clear.
comment:17 by , 5 weeks ago
Triage Stage: | Accepted → Ready for checkin |
---|
Yes, OK... — I'll accept this as certainly in theory. I'd like to see some basic profiling of the change as part of the development, but it's likely correct using any decent hasher.