Opened 6 months ago

Closed 6 months ago

Last modified 6 months ago

#36242 closed Cleanup/optimization (wontfix)

NodeList render overhead with huge templates

Reported by: Michal Čihař Owned by:
Component: Uncategorized Version: 5.1
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

While debugging rendering of some huge templates, I've noticed that the rendering is slower and needs more memory than necessary because of:

    def render(self, context):
        return SafeString("".join([node.render_annotated(context) for node in self]))

which unnecessarily builds a list and then passes it to join, which could directly consume an iterable.

I will prepare a pull request with a fix.

Change History (4)

comment:1 by Mariusz Felisiak, 6 months ago

Resolution: wontfix
Status: newclosed

There is nothing to fix here. A list comprehension is preferable here as str.join() converts to list internally anyway. It is better performance to provide a list up front.

comment:2 by Michal Čihař, 6 months ago

Thanks for sharing this. I've seen improvement with my data, but apparently, this depends on the actual content and I should have done more research.

On short strings using list clearly wins:

(py3.14)$ python -m timeit '"".join([str(n) for n in range(1000)])'
5000 loops, best of 5: 79.8 usec per loop
(py3.14)$ python -m timeit '"".join(str(n) for n in range(1000))'
5000 loops, best of 5: 102 usec per loop

On long strings it is the other way around:

(py3.14)$ python -m timeit -n 1000 '"".join(["x" * 5000 for n in range(1000)])'
1000 loops, best of 5: 3.27 msec per loop
(py3.14)$ python -m timeit -n 1000 '"".join("x" * 5000 for n in range(1000))'
1000 loops, best of 5: 750 usec per loop

But it is more likely that there will be short strings handled in Django templates.

comment:3 by Jacob Walls, 6 months ago

I just ran your second benchmark, and for me the list was consistently faster even for the larger strings

% python -m timeit -n 1000 '"".join(["x" * 5000 for n in range(1000)])'
1000 loops, best of 5: 451 usec per loop
% python -m timeit -n 1000 '"".join("x" * 5000 for n in range(1000))'  
1000 loops, best of 5: 456 usec per loop

comment:4 by Michal Čihař, 6 months ago

This all made me look into the implementation, and the list comprehension seems like the best approach in this case. It calls PySequence_Fast which converts iterable into a list if it is not a list or a tuple. Creating a list using comprehension should be faster than creating a generator and then converting it to the list, but it most likely depends on the CPU cache size and this is the corner case I observe (the strings fill the CPU cache in my case).

Additionally, there is a fast path for same width Unicode strings (separator + all items), so pure ASCII is:

(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join(["x" * 5000 for n in range(1000)])'
1000 loops, best of 5: 4.34 msec per loop
(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join("x" * 5000 for n in range(1000))'
1000 loops, best of 5: 772 usec per loop

But once you mix Unicode into that:

(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join(["š" * 5000 for n in range(1000)])'
1000 loops, best of 5: 10.5 msec per loop
(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join("š" * 5000 for n in range(1000))'
1000 loops, best of 5: 10.4 msec per loop

And now any difference is gone. So, indeed, this is not a way to optimize.

Note: See TracTickets for help on using tickets.
Back to Top