Context Navigation

← Previous Ticket
Next Ticket →

#36242 closed Cleanup/optimization (wontfix)

NodeList render overhead with huge templates

Reported by:	Michal Čihař	Owned by:
Component:	Uncategorized	Version:	5.1
Severity:	Normal	Keywords:
Cc:		Triage Stage:	Unreviewed
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

While debugging rendering of some huge templates, I've noticed that the rendering is slower and needs more memory than necessary because of:

    def render(self, context):
        return SafeString("".join([node.render_annotated(context) for node in self]))

which unnecessarily builds a list and then passes it to join, which could directly consume an iterable.

I will prepare a pull request with a fix.

Change History (4)

comment:1 by Mariusz Felisiak, 10 months ago

Resolution:	→ wontfix
Status:	new → closed

There is nothing to fix here. A list comprehension is preferable here as str.join() converts to list internally anyway. It is better performance to provide a list up front.

comment:2 by Michal Čihař, 10 months ago

Thanks for sharing this. I've seen improvement with my data, but apparently, this depends on the actual content and I should have done more research.

On short strings using list clearly wins:

(py3.14)$ python -m timeit '"".join([str(n) for n in range(1000)])'
5000 loops, best of 5: 79.8 usec per loop
(py3.14)$ python -m timeit '"".join(str(n) for n in range(1000))'
5000 loops, best of 5: 102 usec per loop

On long strings it is the other way around:

(py3.14)$ python -m timeit -n 1000 '"".join(["x" * 5000 for n in range(1000)])'
1000 loops, best of 5: 3.27 msec per loop
(py3.14)$ python -m timeit -n 1000 '"".join("x" * 5000 for n in range(1000))'
1000 loops, best of 5: 750 usec per loop

But it is more likely that there will be short strings handled in Django templates.

comment:3 by Jacob Walls, 10 months ago

I just ran your second benchmark, and for me the list was consistently faster even for the larger strings

% python -m timeit -n 1000 '"".join(["x" * 5000 for n in range(1000)])'
1000 loops, best of 5: 451 usec per loop
% python -m timeit -n 1000 '"".join("x" * 5000 for n in range(1000))'  
1000 loops, best of 5: 456 usec per loop

comment:4 by Michal Čihař, 10 months ago

This all made me look into the implementation, and the list comprehension seems like the best approach in this case. It calls PySequence_Fast which converts iterable into a list if it is not a list or a tuple. Creating a list using comprehension should be faster than creating a generator and then converting it to the list, but it most likely depends on the CPU cache size and this is the corner case I observe (the strings fill the CPU cache in my case).

Additionally, there is a fast path for same width Unicode strings (separator + all items), so pure ASCII is:

(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join(["x" * 5000 for n in range(1000)])'
1000 loops, best of 5: 4.34 msec per loop
(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join("x" * 5000 for n in range(1000))'
1000 loops, best of 5: 772 usec per loop

But once you mix Unicode into that:

(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join(["š" * 5000 for n in range(1000)])'
1000 loops, best of 5: 10.5 msec per loop
(py3.14) nijel@lobsang:/tmp$ python -m timeit -n 1000 '"".join("š" * 5000 for n in range(1000))'
1000 loops, best of 5: 10.4 msec per loop

And now any difference is gone. So, indeed, this is not a way to optimize.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#36242 closed Cleanup/optimization (wontfix)

NodeList render overhead with huge templates

Description

Change History (4)

comment:1 by Mariusz Felisiak, 10 months ago

comment:2 by Michal Čihař, 10 months ago

comment:3 by Jacob Walls, 10 months ago

comment:4 by Michal Čihař, 10 months ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us