Opened 3 years ago
Last modified 3 years ago
#33465 closed Cleanup/optimization
Introduce empty __slots__ protocol for SafeString & SafeData — at Initial Version
Reported by: | Keryn Knight | Owned by: | Keryn Knight |
---|---|---|---|
Component: | Utilities | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This is a case-by-case proposal ultimately referencing #12826
Because SafeString
is used a lot and is otherwise supposed to be treatable as a untainted str
we should be able to (AFAIK) update it + it's inheritance chain to use __slots__ = ()
whilst still allowing custom subclasses of either to add additional attributes. By defining __slots__
as empty on SafeString
(and SafeData
) we'd avoid creation of a __dict__
on the instance, which mirrors the str()
behaviour.
According to pympler, currently in Python 3.10
using the following back of the napkins strings:
In [4]: s = "test" # this might be interned, as a short string? In [5]: s2 = "test" * 100 In [6]: s3 = SafeString("test") In [7]: s4 = SafeString("test" * 100)
we get:
In [8]: asizeof(s) # str Out[8]: 56 In [9]: asizeof(s2) # str Out[9]: 456 In [10]: asizeof(s3) # SafeString Out[10]: 208 In [11]: asizeof(s4) # SafeString Out[11]: 608
But if we swap out the implementation to be slots'd, it looks more like:
In [8]: asizeof(s) # str Out[8]: 56 In [9]: asizeof(s2) # str Out[9]: 456 In [10]: asizeof(s3) # SafeString Out[10]: 104 In [11]: asizeof(s4) # SafeString Out[11]: 504
So we're "saving" 104 bytes
per SafeString
created, by the look of it. I presume it to be some fun implementation detail of something somewhere that it is allegedly accounting for more than 56
bytes, which is the asizeof({})
A quick and dirty check over the test suite suggests that for me locally, running 14951 tests in 512.912s
accounted for 949.0 MB
of SafeStrings, checked by just incrementing a global integer of bytes (using SafeString.__new__
and --parallel=1
) and piping that to filesizeformat
, so y'know, room for error.
After the patch, the same tests accounted for 779.4 MB
of SafeString
, "saving" 170 MB
overall.
The only functionality this would preclude -- as far as I know -- is no longer being able to bind arbitrary values to an instance like so:
s = SafeString('test') s.test = 1
which would raise AttributeError
if __slots__
were added, just like trying to assign attributes to str()
directly does.
I don't believe this will have any marked performance change, as neither SafeString
nor SafeData
actually have any extra attributes, only methods.
I have a branch which implements this, and tests pass for me locally.