id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
32824	Potential micro-optimisations for NodeList.render	Keryn Knight	Keryn Knight	"The current implementation of `NodeList.render` looks like:
{{{
def render(self, context):
    bits = []
    for node in self:
        if isinstance(node, Node):
            bit = node.render_annotated(context)
        else:
            bit = node
        bits.append(str(bit))
    return mark_safe(''.join(bits))
}}}

which, given a single template render (template + code to be attached, but is basically the same fakery as in #32814) is rendered 41 times (cProfile output, relevant portions to be discussed in this ticket only):
{{{
   41/1    0.000    0.000    0.007    0.007 base.py:956(render)
    591    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
    191    0.000    0.000    0.000    0.000 {built-in method builtins.callable}
115/113    0.000    0.000    0.000    0.000 safestring.py:53(mark_safe)
}}}

Given the numbers are too miniscule to register, for the purposes of surfacing the most time consuming parts, and pushing things like importlib machinery etc to the bottom of the profile output, rendering the same template 1000 times gives us:
{{{
4897753 function calls (4539046 primitive calls) in 2.832 seconds

 41000/1000    0.193    0.000    2.671    0.003 base.py:956(render)
70045/68045    0.042    0.000    0.095    0.000 safestring.py:50(mark_safe)
     128045    0.053    0.000    0.053    0.000 safestring.py:26(__init__)
     159432    0.023    0.000    0.023    0.000 {built-in method builtins.hasattr}
     114077    0.012    0.000    0.012    0.000 {built-in method builtins.callable}
     686582    0.076    0.000    0.076    0.000 {built-in method builtins.isinstance}
     318184    0.036    0.000    0.036    0.000 {method 'append' of 'list' objects}
}}}
(to ensure `SafeString.__init__` calls are registered, I manually created the method, so timings for it aren't correct but counts will be).

==== Proposed change

The primary change I'm suggesting is deviating from using `mark_safe` to directly returning a new `SafeString` - as far as I know, the result of `''.join(...)` is ''always'' of type `str` regardless of if any of the underlying bits are themselves `SafeString` etc, so for each call to `NodeList.render` we'd be saving 1 `hasattr(...)` test and 1 `callable(...)` test, which would change the 1000 runs to:
{{{
4774747 function calls (4416040 primitive calls) in 2.710 seconds

 41000/1000    0.194    0.000    2.669    0.003 base.py:956(render)
29045/27045    0.019    0.000    0.063    0.000 safestring.py:50(mark_safe)
     128045    0.048    0.000    0.048    0.000 safestring.py:26(__init__)
     118432    0.019    0.000    0.019    0.000 {built-in method builtins.hasattr}
      73077    0.008    0.000    0.008    0.000 {built-in method builtins.callable}
     686582    0.073    0.000    0.073    0.000 {built-in method builtins.isinstance}
     318184    0.038    0.000    0.038    0.000 {method 'append' of 'list' objects}
}}}

You can see that whilst the timings don't change ''substantially'', it is a whole bunch less calls.

==== Worth investigating?

But ''possibly'' more interesting than that is the rest of the method. As far as I can tell from running the test suite (which, tbf, does include a boatload of skips being output), the line `if isinstance(node, Node):` is **always** true -- that is, there's no test anywhere for what happens if you somehow, somewhere include a non `Node` instance into a `NodeList`? If that's the case (which again, it may not be, given skips) then there's 2 possiblities:

1. add in a test demonstrating what should happen when a `non-Node` appears in a `NodeList` and canonically accept it as part of DTL
2. take the stance that NodeLists are inherently composed of only nodes (or at least things with `render_annotated(context)`) and sweep away the isinstance check.

If option 2 were taken, the final method could be (without any intermediate deprecation stuffs if necessary):
{{{
def render(self, context):
    bits = []
    append = bits.append
    for node in self:
        append(str(node.render_annotated(context)))
    return SafeString(''.join(bits))
}}}

which, compounded with the change from `mark_safe` to `SafeString` proposed above would give (again, 1000 renders):

{{{
4575748 function calls (4217041 primitive calls) in 2.685 seconds

 41000/1000    0.152    0.000    2.567    0.003 base.py:956(render)
29045/27045    0.018    0.000    0.062    0.000 safestring.py:53(mark_safe
     128045    0.048    0.000    0.048    0.000 safestring.py:26(__init__))
     118432    0.019    0.000    0.019    0.000 {built-in method builtins.hasattr}
      73077    0.008    0.000    0.008    0.000 {built-in method builtins.callable}
     487582    0.058    0.000    0.058    0.000 {built-in method builtins.isinstance}
     318184    0.039    0.000    0.039    0.000 {method 'append' of 'list' objects}
}}}
for a total saving of 199,000 `isinstance` checks, 41,000 `hasattr` checks and 41,000 `callable` checks (from a starting point of `4897753` function and `4539046` primitive calls down to `4575748` and `4217041` respectively)

Were that final method implementation the target/goal, it ''may'' be OK to move from stacking up a intermediate list (the `nodes` variable) to a generator expression into the `''.join(...)` but briefly using `memory_profiler` to `@profile` the method shows its all 0 allocations anyway (presumably references & gc detection?)

==== Additional notes

Yappi is slower than cProfile but suggests the same small wins (amplified in both cases by the profiling machinery slowing everything down).

The difference (when measured via yappi, and reported via pycharm's handy profile visualiser) between the **original method** and the **ideal replacement** would be a decrease from `7.6%` of time being spent (directly) on this method to `5.7%`

The difference by just removing the `mark_safe` call and the `SafeString` suggestion is obviously less pronounced, but yappi suggests that the time spent in `mark_safe` goes from `1.9%` of time to `0.9%`

Those proportions are ''vaguely'' the same for cProfile, give or take."	Cleanup/optimization	closed	Template system	dev	Normal	fixed			Ready for checkin	1	0	0	0	0	0