Opened 4 years ago
Closed 4 years ago
#33465 closed Cleanup/optimization (fixed)
Introduce empty __slots__ protocol for SafeString & SafeData
| Reported by: | Keryn Knight | Owned by: | Keryn Knight |
|---|---|---|---|
| Component: | Utilities | Version: | dev |
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Ready for checkin | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
This is a case-by-case proposal ultimately referencing #12826
Because SafeString is used a lot and is otherwise supposed to be treatable as a untainted str we should be able to (AFAIK) update it + it's inheritance chain to use __slots__ = () whilst still allowing custom subclasses of either to add additional attributes. By defining __slots__ as empty on SafeString (and SafeData) we'd avoid creation of a __dict__ on the instance, which mirrors the str() behaviour.
According to pympler, currently in Python 3.10 using the following back of the napkins strings:
In [4]: s = "test" # this might be interned, as a short string?
In [5]: s2 = "test" * 100
In [6]: s3 = SafeString("test")
In [7]: s4 = SafeString("test" * 100)
we get:
In [8]: asizeof(s) # str Out[8]: 56 In [9]: asizeof(s2) # str Out[9]: 456 In [10]: asizeof(s3) # SafeString Out[10]: 208 In [11]: asizeof(s4) # SafeString Out[11]: 608
But if we swap out the implementation to be slots'd, it looks more like:
In [8]: asizeof(s) # str Out[8]: 56 In [9]: asizeof(s2) # str Out[9]: 456 In [10]: asizeof(s3) # SafeString Out[10]: 104 In [11]: asizeof(s4) # SafeString Out[11]: 504
So we're "saving" 104 bytes per SafeString created, by the look of it. I presume it to be some fun implementation detail of something somewhere that it is allegedly accounting for more than 64 bytes, which is the asizeof({})
A quick and dirty check over the test suite suggests that for me locally, running 14951 tests in 512.912s accounted for 949.0 MB of SafeStrings, checked by just incrementing a global integer of bytes (using SafeString.__new__ and --parallel=1) and piping that to filesizeformat, so y'know, room for error.
After the patch, the same tests accounted for 779.4 MB of SafeString, "saving" 170 MB overall.
The only functionality this would preclude -- as far as I know -- is no longer being able to bind arbitrary values to an instance like so:
s = SafeString('test')
s.test = 1
which would raise AttributeError if __slots__ were added, just like trying to assign attributes to str() directly does.
I don't believe this will have any marked performance change, as neither SafeString nor SafeData actually have any extra attributes, only methods.
I have a branch which implements this, and tests pass for me locally.
Change History (5)
comment:1 by , 4 years ago
| Description: | modified (diff) |
|---|
comment:2 by , 4 years ago
comment:4 by , 4 years ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
I think this has a good potential for reducing memory footprint, +1.