#37125 closed Cleanup/optimization (invalid)
Use __new__ to sanitize TaskResult instead of __post_init__ to half memory usage
| Reported by: | Johannes Maron | Owned by: | zky |
|---|---|---|---|
| Component: | Tasks | Version: | dev |
| Severity: | Normal | Keywords: | |
| Cc: | Johannes Maron | Triage Stage: | Unreviewed |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | yes | UI/UX: | no |
Description
One discovery was the memory overhead created by __post_init__ which doubles the working memory when compared to a factory method.
A __new__ method is what I used in the benchmark and what I would consider the most Pythonic way while also maintaining full compatibility.
Change History (12)
comment:1 by , 3 weeks ago
comment:2 by , 3 weeks ago
Sure, where are my manners? Here you go. It's slop, but I tweaked it to minimize side effects:
import timeit
import tracemalloc
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
from django.utils.json import normalize_json
from django.tasks.base import Task, TaskResult, TaskResultStatus
# --- Real Task instance ---
def my_func():
pass
real_task = Task.__new__(Task)
object.__setattr__(real_task, "func", my_func)
object.__setattr__(real_task, "priority", 0)
object.__setattr__(real_task, "backend", "default")
object.__setattr__(real_task, "queue_name", "default")
object.__setattr__(real_task, "run_after", None)
object.__setattr__(real_task, "takes_context", False)
# --- Shared kwargs ---
now = datetime.now()
common_kwargs = dict(
task=real_task,
id="abc123",
status=TaskResultStatus.SUCCESSFUL,
enqueued_at=now,
started_at=now,
finished_at=now,
last_attempted_at=now,
args=[],
kwargs={},
backend="default",
errors=[],
worker_ids=["worker-1"],
)
# --- Variant 1: __post_init__ (same as original TaskResult) ---
@dataclass(frozen=True, slots=True, kw_only=True)
class TaskResultPostInit:
task: Any
id: str
status: Any
enqueued_at: datetime | None
started_at: datetime | None
finished_at: datetime | None
last_attempted_at: datetime | None
args: list[Any]
kwargs: dict[str, Any]
backend: str
errors: list
worker_ids: list[str]
_return_value: Any | None = field(init=False, default=None)
def __post_init__(self):
object.__setattr__(self, "args", normalize_json(self.args))
object.__setattr__(self, "kwargs", normalize_json(self.kwargs))
# --- Variant 2: classmethod factory, no __post_init__ ---
@dataclass(frozen=True, slots=True, kw_only=True)
class TaskResultNew:
task: Any
id: str
status: Any
enqueued_at: datetime | None
started_at: datetime | None
finished_at: datetime | None
last_attempted_at: datetime | None
args: list[Any]
kwargs: dict[str, Any]
backend: str
errors: list
worker_ids: list[str]
_return_value: Any | None = field(init=False, default=None)
def __new__(cls, *args, **kwargs):
kwargs["args"] = normalize_json(kwargs["args"])
kwargs["kwargs"] = normalize_json(kwargs["kwargs"])
return super().__new__(cls)
# --- Benchmark ---
N = 1_000_000
def run_bench(label, fn):
t = timeit.timeit(fn, number=N)
tracemalloc.start()
instances = [fn() for _ in range(N)]
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
del instances
print(f"{label}")
print(f" Time: {t:.3f}s ({t/N*1e6:.2f} µs/call)")
print(f" Peak mem: {peak / 1024 / 1024:.2f} MB (for {N:,} instances)")
print()
run_bench(
"TaskResult with __post_init__",
lambda: TaskResultPostInit(**common_kwargs),
)
run_bench(
"TaskResult with classmethod factory",
lambda: TaskResultNew(**common_kwargs),
)
comment:3 by , 3 weeks ago
… and the results:
TaskResult with __post_init__ Time: 1.126s (1.13 µs/call) Peak mem: 252.19 MB (for 1,000,000 instances) TaskResult with classmethod factory Time: 1.326s (1.33 µs/call) Peak mem: 137.76 MB (for 1,000,000 instances)
comment:4 by , 3 weeks ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
| Triage Stage: | Unreviewed → Accepted |
Hi Johannes, since I just assigned #37126 to myself, I'd like to handle this memory optimization alongside it, because they both modify the TaskResult model. I'm assigning this to myself as well so I can bundle them in the same PR and avoid conflicts.
comment:5 by , 3 weeks ago
@zky go for it! I think those changes are fairly conflict-free, though. IMHO, the real work will be writing good tests to prevent performance regression. The actual patch is probably just those four lines:
def __new__(cls, *args, **kwargs):
kwargs["args"] = normalize_json(kwargs["args"])
kwargs["kwargs"] = normalize_json(kwargs["kwargs"])
return super().__new__(cls)
follow-up: 10 comment:6 by , 3 weeks ago
| Resolution: | → needsinfo |
|---|---|
| Status: | assigned → closed |
| Triage Stage: | Accepted → Unreviewed |
I think the bench incorrectly implements __new__(). It looks like the kwargs are mutated in place, but they're actually re-bound and discarded:
>>> def a(**kwargs): return kwargs ... >>> kw = {} >>> inner = a(**kw) >>> inner is kw False
comment:7 by , 3 weeks ago
Why using __new__ which is used for class initialization while args and kwargs should be instance-level attributes? https://docs.python.org/3/reference/datamodel.html#object.__new
comment:8 by , 3 weeks ago
comment:9 by , 3 weeks ago
| Resolution: | needsinfo → invalid |
|---|
comment:10 by , 3 weeks ago
Thanks for the great catch, Jacob! Indeed, after running a quick test locally, I confirmed exactly what you pointed out—modifying kwargs inside new doesn't actually take effect for the initialization.
Given this, would introducing a classmethod factory be a viable alternative to solve the original memory issue?Replying to Jacob Walls:
I think the bench incorrectly implements
__new__(). It looks like thekwargsare mutated in place, but they're actually re-bound and discarded:
>>> def a(**kwargs): return kwargs ... >>> kw = {} >>> inner = a(**kw) >>> inner is kw False
comment:11 by , 3 weeks ago
Thanks for confirming. At this point we would need a fresh bench confirming that there is a memory issue at all.
comment:12 by , 3 weeks ago
I wrote a benchmark with a proper factory method, and I can confirm that the __post_init__ does NOT add any memory overhead. Hence I marked it as invalid.
I'll keep looking, but I am out of ideas at this point and happy to see that this is already very efficient.

Can you share the benchmark script? I'd like to just eyeball the setup before drawing conclusions.