Opened 2 years ago

Closed 2 years ago

#33465 closed Cleanup/optimization (fixed)

Introduce empty __slots__ protocol for SafeString & SafeData

Reported by: Keryn Knight Owned by: Keryn Knight
Component: Utilities Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Keryn Knight)

This is a case-by-case proposal ultimately referencing #12826

Because SafeString is used a lot and is otherwise supposed to be treatable as a untainted str we should be able to (AFAIK) update it + it's inheritance chain to use __slots__ = () whilst still allowing custom subclasses of either to add additional attributes. By defining __slots__ as empty on SafeString (and SafeData) we'd avoid creation of a __dict__ on the instance, which mirrors the str() behaviour.

According to pympler, currently in Python 3.10 using the following back of the napkins strings:

In [4]: s = "test"  # this might be interned, as a short string?

In [5]: s2 = "test" * 100

In [6]: s3 = SafeString("test")

In [7]: s4 = SafeString("test" * 100)

we get:

In [8]: asizeof(s)  # str
Out[8]: 56

In [9]: asizeof(s2)  # str
Out[9]: 456

In [10]: asizeof(s3)  # SafeString
Out[10]: 208

In [11]: asizeof(s4)  # SafeString
Out[11]: 608

But if we swap out the implementation to be slots'd, it looks more like:

In [8]: asizeof(s)  # str
Out[8]: 56

In [9]: asizeof(s2)  # str
Out[9]: 456

In [10]: asizeof(s3)  # SafeString
Out[10]: 104

In [11]: asizeof(s4)  # SafeString
Out[11]: 504

So we're "saving" 104 bytes per SafeString created, by the look of it. I presume it to be some fun implementation detail of something somewhere that it is allegedly accounting for more than 64 bytes, which is the asizeof({})

A quick and dirty check over the test suite suggests that for me locally, running 14951 tests in 512.912s accounted for 949.0 MB of SafeStrings, checked by just incrementing a global integer of bytes (using SafeString.__new__ and --parallel=1) and piping that to filesizeformat, so y'know, room for error.
After the patch, the same tests accounted for 779.4 MB of SafeString, "saving" 170 MB overall.

The only functionality this would preclude -- as far as I know -- is no longer being able to bind arbitrary values to an instance like so:

s = SafeString('test')
s.test = 1

which would raise AttributeError if __slots__ were added, just like trying to assign attributes to str() directly does.

I don't believe this will have any marked performance change, as neither SafeString nor SafeData actually have any extra attributes, only methods.

I have a branch which implements this, and tests pass for me locally.

Change History (5)

comment:1 by Keryn Knight, 2 years ago

Description: modified (diff)

comment:2 by Claude Paroz, 2 years ago

I think this has a good potential for reducing memory footprint, +1.

comment:3 by Mariusz Felisiak, 2 years ago

Has patch: set
Triage Stage: UnreviewedAccepted

Sounds reasonable.

PR

comment:4 by Mariusz Felisiak, 2 years ago

Triage Stage: AcceptedReady for checkin

comment:5 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

Resolution: fixed
Status: assignedclosed

In 55022f75:

Fixed #33465 -- Added empty slots to SafeString and SafeData.

Despite inheriting from the str type, every SafeString instance gains
an empty dict due to the normal, expected behaviour of type
subclassing in Python.

Adding slots to SafeData is necessary, because otherwise inheriting
from that (as SafeString does) will give it a dict and negate the
benefit added by modifying SafeString.

Note: See TracTickets for help on using tickets.
Back to Top