Opened 3 weeks ago

Closed 3 weeks ago

Last modified 3 weeks ago

#37125 closed Cleanup/optimization (invalid)

Use __new__ to sanitize TaskResult instead of __post_init__ to half memory usage

Reported by: Johannes Maron Owned by: zky
Component: Tasks Version: dev
Severity: Normal Keywords:
Cc: Johannes Maron Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Jake and I have been discussing changes to the Task framework for 6.2 with a focus on performance and versatility.

One discovery was the memory overhead created by __post_init__ which doubles the working memory when compared to a factory method.

A __new__ method is what I used in the benchmark and what I would consider the most Pythonic way while also maintaining full compatibility.

Change History (12)

comment:1 by Jacob Walls, 3 weeks ago

Can you share the benchmark script? I'd like to just eyeball the setup before drawing conclusions.

comment:2 by Johannes Maron, 3 weeks ago

Sure, where are my manners? Here you go. It's slop, but I tweaked it to minimize side effects:

import timeit
import tracemalloc
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any

from django.utils.json import normalize_json
from django.tasks.base import Task, TaskResult, TaskResultStatus

# --- Real Task instance ---

def my_func():
    pass

real_task = Task.__new__(Task)
object.__setattr__(real_task, "func", my_func)
object.__setattr__(real_task, "priority", 0)
object.__setattr__(real_task, "backend", "default")
object.__setattr__(real_task, "queue_name", "default")
object.__setattr__(real_task, "run_after", None)
object.__setattr__(real_task, "takes_context", False)

# --- Shared kwargs ---

now = datetime.now()

common_kwargs = dict(
    task=real_task,
    id="abc123",
    status=TaskResultStatus.SUCCESSFUL,
    enqueued_at=now,
    started_at=now,
    finished_at=now,
    last_attempted_at=now,
    args=[],
    kwargs={},
    backend="default",
    errors=[],
    worker_ids=["worker-1"],
)

# --- Variant 1: __post_init__ (same as original TaskResult) ---

@dataclass(frozen=True, slots=True, kw_only=True)
class TaskResultPostInit:
    task: Any
    id: str
    status: Any
    enqueued_at: datetime | None
    started_at: datetime | None
    finished_at: datetime | None
    last_attempted_at: datetime | None
    args: list[Any]
    kwargs: dict[str, Any]
    backend: str
    errors: list
    worker_ids: list[str]
    _return_value: Any | None = field(init=False, default=None)

    def __post_init__(self):
        object.__setattr__(self, "args", normalize_json(self.args))
        object.__setattr__(self, "kwargs", normalize_json(self.kwargs))


# --- Variant 2: classmethod factory, no __post_init__ ---

@dataclass(frozen=True, slots=True, kw_only=True)
class TaskResultNew:
    task: Any
    id: str
    status: Any
    enqueued_at: datetime | None
    started_at: datetime | None
    finished_at: datetime | None
    last_attempted_at: datetime | None
    args: list[Any]
    kwargs: dict[str, Any]
    backend: str
    errors: list
    worker_ids: list[str]
    _return_value: Any | None = field(init=False, default=None)

    def __new__(cls, *args, **kwargs):
        kwargs["args"] = normalize_json(kwargs["args"])
        kwargs["kwargs"] = normalize_json(kwargs["kwargs"])
        return super().__new__(cls)


# --- Benchmark ---

N = 1_000_000

def run_bench(label, fn):
    t = timeit.timeit(fn, number=N)

    tracemalloc.start()
    instances = [fn() for _ in range(N)]
    _, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    del instances

    print(f"{label}")
    print(f"  Time:      {t:.3f}s  ({t/N*1e6:.2f} µs/call)")
    print(f"  Peak mem:  {peak / 1024 / 1024:.2f} MB (for {N:,} instances)")
    print()

run_bench(
    "TaskResult with __post_init__",
    lambda: TaskResultPostInit(**common_kwargs),
)

run_bench(
    "TaskResult with classmethod factory",
    lambda: TaskResultNew(**common_kwargs),
)

comment:3 by Johannes Maron, 3 weeks ago

… and the results:

TaskResult with __post_init__
  Time:      1.126s  (1.13 µs/call)
  Peak mem:  252.19 MB (for 1,000,000 instances)

TaskResult with classmethod factory
  Time:      1.326s  (1.33 µs/call)
  Peak mem:  137.76 MB (for 1,000,000 instances)

comment:4 by zky, 3 weeks ago

Owner: set to zky
Status: newassigned
Triage Stage: UnreviewedAccepted

Hi Johannes, since I just assigned #37126 to myself, I'd like to handle this memory optimization alongside it, because they both modify the TaskResult model. I'm assigning this to myself as well so I can bundle them in the same PR and avoid conflicts.

comment:5 by Johannes Maron, 3 weeks ago

@zky go for it! I think those changes are fairly conflict-free, though. IMHO, the real work will be writing good tests to prevent performance regression. The actual patch is probably just those four lines:

    def __new__(cls, *args, **kwargs):
        kwargs["args"] = normalize_json(kwargs["args"])
        kwargs["kwargs"] = normalize_json(kwargs["kwargs"])
        return super().__new__(cls)

comment:6 by Jacob Walls, 3 weeks ago

Resolution: needsinfo
Status: assignedclosed
Triage Stage: AcceptedUnreviewed

I think the bench incorrectly implements __new__(). It looks like the kwargs are mutated in place, but they're actually re-bound and discarded:

>>> def a(**kwargs): return kwargs
... 
>>> kw = {}
>>> inner = a(**kw)
>>> inner is kw
False

comment:7 by David, 3 weeks ago

Why using __new__ which is used for class initialization while args and kwargs should be instance-level attributes? https://docs.python.org/3/reference/datamodel.html#object.__new

comment:8 by Johannes Maron, 3 weeks ago

https://media0.giphy.com/media/v1.Y2lkPTc5MGI3NjExdzZ1Y3U3YzJpeXpjbnNyNXhqOWVtazdqZXo4bTZnMjVoem1veTI3ZSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/Ra1bmpxpsppNC/giphy.gif

Y'all are absolutely right. It was too good to be true. I was spending too much time with Claude. It has successfully dumbed me down.

comment:9 by Johannes Maron, 3 weeks ago

Resolution: needsinfoinvalid

in reply to:  6 comment:10 by zky, 3 weeks ago

Thanks for the great catch, Jacob! Indeed, after running a quick test locally, I confirmed exactly what you pointed out—modifying kwargs inside new doesn't actually take effect for the initialization.

Given this, would introducing a classmethod factory be a viable alternative to solve the original memory issue?Replying to Jacob Walls:

I think the bench incorrectly implements __new__(). It looks like the kwargs are mutated in place, but they're actually re-bound and discarded:

>>> def a(**kwargs): return kwargs
... 
>>> kw = {}
>>> inner = a(**kw)
>>> inner is kw
False
Version 0, edited 3 weeks ago by zky (next)

comment:11 by Jacob Walls, 3 weeks ago

Thanks for confirming. At this point we would need a fresh bench confirming that there is a memory issue at all.

comment:12 by Johannes Maron, 3 weeks ago

I wrote a benchmark with a proper factory method, and I can confirm that the __post_init__ does NOT add any memory overhead. Hence I marked it as invalid.

I'll keep looking, but I am out of ideas at this point and happy to see that this is already very efficient.

Note: See TracTickets for help on using tickets.
Back to Top