Opened 3 weeks ago

Closed 3 weeks ago

Last modified 5 days ago

#36705 closed Cleanup/optimization (fixed)

Repeated string concatenation is slow on some python implementations, in particular PyPy

Reported by: Jacob Walls Owned by: Varun Kasyap Pentamaraju
Component: Utilities Version: dev
Severity: Normal Keywords: PyPy
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There is a known limitation in PyPy that repeated string concatenation is slow. The discussion there observes that CPython has an optimization, but it's fragile and best not relied on.

Seokchan Yoon (thanks for the report!) raised this with the security team, and we decided any hardening here could be done in public.

To close this ticket, we should:

  • implement targeted optimizations in django.utils.numberformat.format() to avoid repeated string concatenation
  • audit for other similar cases
  • Add a disclaimer to our documentation around PyPy, for example, wording like:

That said, a lot of a web framework's work is done by concatenating strings, and PyPy has an issue with that (see https://pypy.org/posts/2023/01/string-concatenation-quadratic.html). This may cause performance issues, depending on your use.

Change History (13)

comment:1 by Natalia Bidart, 3 weeks ago

Triage Stage: UnreviewedAccepted

Thank you Jacob!

comment:2 by Varun Kasyap Pentamaraju, 3 weeks ago

Owner: set to Varun Kasyap Pentamaraju
Status: newassigned

I am willing to contribute

comment:3 by Varun Kasyap Pentamaraju, 3 weeks ago

Has patch: set

comment:4 by Jacob Walls, 3 weeks ago

Patch needs improvement: set

comment:5 by Varun Kasyap Pentamaraju, 3 weeks ago

Patch needs improvement: unset

comment:6 by Jacob Walls, 3 weeks ago

Patch needs improvement: set

comment:7 by Varun Kasyap Pentamaraju, 3 weeks ago

Patch needs improvement: unset

comment:8 by Jacob Walls, 3 weeks ago

Patch needs improvement: set

comment:9 by Jacob Walls, 3 weeks ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:10 by Jacob Walls <jacobtylerwalls@…>, 3 weeks ago

Resolution: fixed
Status: assignedclosed

In 1c7db70:

Fixed #36705 -- Avoided string concatenation in utils.

Repeated string concatenation performs poorly on PyPy.
Thanks Seokchan Yoon for the report.

comment:11 by Adam Johnson, 7 days ago

I'm a little bit skeptical of this change's necessity, and I think it may have introduced an issue: https://github.com/django/django/pull/20045/files#r2544009071 . PyPy currently only supports Python 3.11, while Django's main branch targets Python 3.12+, so it's not clear when anyone could use the change. Also, I've never heard of a Django project using PyPy—it seems impractical ecosystemwise.

comment:12 by Jacob Walls, 6 days ago

Thanks for spotting the logic change. I replied that there's a chance the logic change is more correct: we should add a test to be sure.

I heard that Django has never supported PyPy as a deployment target. Despite whispers of PyPy winding down, it seems to be slowly keeping up in its own time.

For me, this change wasn't just about PyPy. I was persuaded by the linked blog post's explanation that even in CPython adequate performance on catastrophic string concatenation depends on a fragile optimization.

comment:13 by Jacob Walls <jacobtylerwalls@…>, 5 days ago

In 3f15935:

Refs #36705 -- Added coverage for multiple types of enclosing punctuation in urlize().

This case was inadvertently fixed in ad94446fcc5b50401dd0c48718502d5d1b92df58.

Note: See TracTickets for help on using tickets.
Back to Top