#36242 closed Cleanup/optimization (wontfix)
NodeList render overhead with huge templates
Reported by: | Michal Čihař | Owned by: | |
---|---|---|---|
Component: | Uncategorized | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
While debugging rendering of some huge templates, I've noticed that the rendering is slower and needs more memory than necessary because of:
def render(self, context): return SafeString("".join([node.render_annotated(context) for node in self]))
which unnecessarily builds a list and then passes it to join, which could directly consume an iterable.
I will prepare a pull request with a fix.
Change History (4)
comment:1 by , 6 months ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:2 by , 6 months ago
Thanks for sharing this. I've seen improvement with my data, but apparently, this depends on the actual content and I should have done more research.
On short strings using list clearly wins:
(py3.14)$ python -m timeit '"".join([str(n) for n in range(1000)])' 5000 loops, best of 5: 79.8 usec per loop (py3.14)$ python -m timeit '"".join(str(n) for n in range(1000))' 5000 loops, best of 5: 102 usec per loop
On long strings it is the other way around:
(py3.14)$ python -m timeit -n 1000 '"".join(["x" * 5000 for n in range(1000)])' 1000 loops, best of 5: 3.27 msec per loop (py3.14)$ python -m timeit -n 1000 '"".join("x" * 5000 for n in range(1000))' 1000 loops, best of 5: 750 usec per loop
But it is more likely that there will be short strings handled in Django templates.
comment:3 by , 6 months ago
I just ran your second benchmark, and for me the list was consistently faster even for the larger strings
% python -m timeit -n 1000 '"".join(["x" * 5000 for n in range(1000)])' 1000 loops, best of 5: 451 usec per loop % python -m timeit -n 1000 '"".join("x" * 5000 for n in range(1000))' 1000 loops, best of 5: 456 usec per loop
comment:4 by , 6 months ago
This all made me look into the implementation, and the list comprehension seems like the best approach in this case. It calls PySequence_Fast
which converts iterable into a list if it is not a list or a tuple. Creating a list using comprehension should be faster than creating a generator and then converting it to the list, but it most likely depends on the CPU cache size and this is the corner case I observe (the strings fill the CPU cache in my case).
Additionally, there is a fast path for same width Unicode strings (separator + all items), so pure ASCII is:
(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join(["x" * 5000 for n in range(1000)])' 1000 loops, best of 5: 4.34 msec per loop (py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join("x" * 5000 for n in range(1000))' 1000 loops, best of 5: 772 usec per loop
But once you mix Unicode into that:
(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join(["š" * 5000 for n in range(1000)])' 1000 loops, best of 5: 10.5 msec per loop (py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join("š" * 5000 for n in range(1000))' 1000 loops, best of 5: 10.4 msec per loop
And now any difference is gone. So, indeed, this is not a way to optimize.
There is nothing to fix here. A list comprehension is preferable here as
str.join()
converts to list internally anyway. It is better performance to provide a list up front.