Opened 20 months ago

Last modified 19 months ago

#32312 new Cleanup/optimization

Performance decrease following implementation of ASGI handler and coroutine-safety.

Reported by: David Smith Owned by: nobody
Component: Core (Other) Version: 3.0
Severity: Normal Keywords: async
Cc: Andrew Godwin, Adam Johnson Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

#30451 added the ASGI handler and coroutine-safety.

I've spotted that this resulted in a number of the djangobench benchmarks taking longer to run, in some cases there are quite large impacts. I'll attach the full log from my run, but I found djangobench does not give repeatable outputs. I found that cpu isolation with pyperf helps to give repeatable outputs, and have been using the asv tool as a runner for this.

I've not re-created all of the benchmarks yet, but I have run a few see here to be more certain about the performance impacts before opening this ticket. Comparing the commit in question to the previous commit, I get the table below. A range of benchmarks see a performance decrease of c20%-50%. url_reverse() seems to be an outlier which is very materially impacted at c140% slower.

I'm not sure if anything can be done about this, but thought it worth flagging as I couldn't see any previous discussion on this topic.

.       before           after         ratio
     [cce47ff6]       [a415ce70]
     <3.0a1~249>       <3.0a1~248>
+         474±7μs      1.15±0.01ms     2.43  bench_urls.URLBenchmarks.time_url_reverse
+         656±3μs      1.01±0.01ms     1.54  forms.FormBenchmarks.time_create_form
+      17.9±0.6ms       27.1±0.2ms     1.52  bench_urls.URLBenchmarks.time_url_resolve
+        464±20μs         698±20μs     1.50  query.QueryDelete.time_query_delete
+      35.5±0.3ms       45.2±0.9ms     1.27  bench_models.ModelBenchmarks.time_save_new
+     1.68±0.02ms      2.09±0.02ms     1.25  bench_models.ModelBenchmarks.time_model_creation
+     4.83±0.06ms       5.92±0.3ms     1.23  query.QueryCount.time_query_count
+     2.20±0.03ms      2.62±0.04ms     1.19  query.QueryGetOrCreate.time_query_get_or_create
+     1.91±0.07ms      2.20±0.03ms     1.15  query.QueryGet.time_query_get
+         192±2ms          218±1ms     1.14  bench_urls.URLBenchmarks.time_url_resolve_flat

Attachments (3)

results.txt (10.0 KB) - added by David Smith 20 months ago.
Djangobench_benchmarks
url_reverse_new.png (97.0 KB) - added by David Smith 20 months ago.
url_reverse profile new
url_reverse_old.png (44.7 KB) - added by David Smith 20 months ago.
url_reverse profile old

Download all attachments as: .zip

Change History (9)

Changed 20 months ago by David Smith

Attachment: results.txt added

Djangobench_benchmarks

comment:1 Changed 20 months ago by Carlton Gibson

Cc: Andrew Godwin added

Hi David. Thanks for your work here (really good effort! 👍)

First of all I will CC Andrew.

Then, let's dig in and have a review, and work out what's going on.

I'd be inclined to accept as an issue if we confirm a significant slow-down, but we need to decide what that means.
Andrew: your input/guidance here much appreciated.

(I'm seeing various slowness reports on Daphne/Channels, so it's already on my list to look at this sort of after we branch 3.2)

Changed 20 months ago by David Smith

Attachment: url_reverse_new.png added

url_reverse profile new

Changed 20 months ago by David Smith

Attachment: url_reverse_old.png added

url_reverse profile old

comment:2 Changed 20 months ago by David Smith

The benchmark for url_reverse is the outlier here and with it being one of the more simple functions it is easier to see what is going on here. I've attached a couple of images generated by profiling this benchmark.

It shows, an increase due to the replacement of threading.local with asgiref.local.Local. In the "old" image the get_urlconf and get_script_prefix are the first two unnamed bars on the 2nd row, the time to execute these two grow significantly in the "new" version. It's a similar story with form_create but it only has one element impacted and so is less impacted overall by this change. I started to look at save() but this function is more complex, although initial viewing suggests some elements are being impacted by this change.

I think this starts to help indicate what the issue could be.

Your note about what a significant slow down means is important, what does significant mean? I think it's much more than a particular function being slower on a single benchmark, for example, I wouldn't suggest that url_reverse is a particularly good indicator of the overall performance of Django in general.

comment:3 Changed 20 months ago by Andrew Godwin

I would not be surprised that the ASGI worked slowed down Django a little bit - that was always going to be the compromise - and I perf-tested it before and after the 3.1 view/request changes to see a negligible (~5%) difference for sync runtimes.

However, I never _did_ performance test this older commit, and I agree that the new Local class might well be to blame - it was built for safety rather than speed (for obvious reasons), and there's likely some potential there for optimisation. Particularly, it could likely do with some quick-shortcuts out of its deeper codepaths if it knows there's nothing async going on.

comment:4 Changed 19 months ago by Carlton Gibson

Component: UncategorizedCore (Other)
Keywords: async added
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Hey David, sorry for the delay triaging this (3.2... 🙂) — It's a bit unspecific as to what we'll do but let's accept this as a tracking ticket for now, and we can either do sub-tickets or hang things on here.

You already did 622fcb81199864cad1ec9d0b25820a96ef3f586f which is super.

Can I just say, great work you're doing profiling and setting up the asv runs, and the running django bench, and ... 🙂
Please do carry that on as you can, it's phenomenally helpful for you to break the ground there!
🏅

comment:5 Changed 19 months ago by David Smith

No need to apologise, the amount of patches yourself and Mariusz have got in this week is incredible 🤯

I was looking at the new Local class and there is a comment about using contextvars for storage, but this is dependent upon Python 3.7. I was wondering if that could help as contextvars are written in C and so should be faster? But I ran out of talent very quickly here 😃.

The other idea I had is, does Django know if there is any async going on? If we know this can we decide to use threading.local or the asgi version? (It seems logical to me but I'm well out of my depth here).

Finally, Yes, I'm happy to make progress with the benchmarking topic. It's already starting to identify a few items and help prove some other performance enhancements.

comment:6 Changed 19 months ago by Adam Johnson

Cc: Adam Johnson added
Note: See TracTickets for help on using tickets.
Back to Top