#36896 new Cleanup/optimization

Optimize TruncateCharsHTMLParser.process() to avoid redundant sum() calculation

Reported by: Tarek Nakkouch Owned by:
Component: Utilities Version: 6.0
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The TruncateCharsHTMLParser.process() method in django/utils/text.py recalculates sum(len(p) for p in self.output) every time it processes a text chunk. For HTML with multiple text nodes, this repeatedly iterates over the growing output list unnecessarily.

def process(self, data):
    self.processed_chars += len(data)
    if (self.processed_chars == self.length) and (
        sum(len(p) for p in self.output) + len(data) == len(self.rawdata)
    ):
        self.output.append(data)
        raise self.TruncationCompleted
    output = escape("".join(data[: self.remaining]))
    return data, output

Suggested optimization

Cache the output length as self.output_len and increment it when appending to self.output:

  • Initialize self.output_len = 0 in TruncateHTMLParser.__init__()
  • Increment in handle_starttag(), handle_endtag(), handle_data(), feed(), and process()
  • Replace sum(len(p) for p in self.output) with self.output_len

This eliminates redundant iteration over already-processed output.

Change History (0)

Note: See TracTickets for help on using tickets.
Back to Top