Opened 3 weeks ago

Last modified 5 days ago

#37126 assigned New feature

Make Task and TaskResult comparable

Reported by: Johannes Maron Owned by: zky
Component: Tasks Version: dev
Severity: Normal Keywords:
Cc: Johannes Maron Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Jake and I have been discussing changes to the Task framework for 6.2 with a focus on performance and versatility.

Python's queue.PriorityQueue implementation requires objects to be comparable.

Since the base implementation of a Task implements a priority, it only makes sense to provide basic comparability based on the priority and date.

Dataclasses have a neat order=True attribute to make this stupidly simple.

Change History (18)

comment:1 by Johannes Maron, 3 weeks ago

I would also suggest adding a custom eq method. We only need to include id and a simple string comparison is much quicker. Benchmarks N = 100_000_000:

Original == (equal)             18.614s  (186.1 ns/call)
Original == (not equal)         7.452s  (74.5 ns/call)
Id-only == (equal)              4.434s  (44.3 ns/call)
Id-only == (not equal)          4.423s  (44.2 ns/call)

comment:2 by Johannes Maron, 3 weeks ago

Missing benchmark sources for my previous comment:

import timeit
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any

from django.utils.json import normalize_json
from django.tasks.base import Task, TaskResult, TaskResultStatus

# --- Real Task instance ---

def my_func():
    pass

real_task = Task.__new__(Task)
object.__setattr__(real_task, "func", my_func)
object.__setattr__(real_task, "priority", 0)
object.__setattr__(real_task, "backend", "default")
object.__setattr__(real_task, "queue_name", "default")
object.__setattr__(real_task, "run_after", None)
object.__setattr__(real_task, "takes_context", False)

# --- Shared kwargs ---

now = datetime.now()

common_kwargs = dict(
    task=real_task,
    id="abc123",
    status=TaskResultStatus.SUCCESSFUL,
    enqueued_at=now,
    started_at=now,
    finished_at=now,
    last_attempted_at=now,
    args=[],
    kwargs={},
    backend="default",
    errors=[],
    worker_ids=["worker-1"],
)

# --- id-only equality variant ---

@dataclass(frozen=True, slots=True, kw_only=True)
class TaskResultIdEq:
    task: Any = field(compare=False)
    id: str
    status: Any = field(compare=False)
    enqueued_at: datetime | None = field(compare=False)
    started_at: datetime | None = field(compare=False)
    finished_at: datetime | None = field(compare=False)
    last_attempted_at: datetime | None = field(compare=False)
    args: list[Any] = field(compare=False)
    kwargs: dict[str, Any] = field(compare=False)
    backend: str = field(compare=False)
    errors: list = field(compare=False)
    worker_ids: list[str] = field(compare=False)
    _return_value: Any | None = field(init=False, default=None, compare=False)

    def __post_init__(self):
        object.__setattr__(self, "args", normalize_json(self.args))
        object.__setattr__(self, "kwargs", normalize_json(self.kwargs))

# --- Instances to compare ---

a_full = TaskResult(**common_kwargs)
b_full = TaskResult(**common_kwargs)                                      # equal
c_full = TaskResult(**{**common_kwargs, "id": "other"})                  # not equal

a_id = TaskResultIdEq(**common_kwargs)
b_id = TaskResultIdEq(**common_kwargs)                                    # equal
c_id = TaskResultIdEq(**{**common_kwargs, "id": "other"})                # not equal

N = 100_000_000

results = {
    "Original == (equal)":     timeit.timeit(lambda: a_full == b_full, number=N),
    "Original == (not equal)": timeit.timeit(lambda: a_full == c_full, number=N),
    "Id-only == (equal)":      timeit.timeit(lambda: a_id == b_id, number=N),
    "Id-only == (not equal)":  timeit.timeit(lambda: a_id == c_id, number=N),
}

for label, t in results.items():
    print(f"{label:<30}  {t:.3f}s  ({t/N*1e9:.1f} ns/call)")

comment:3 by zky, 3 weeks ago

Hi Johannes, thanks for providing such clear benchmarks! The performance gains for the task queue are very significant. I'm quite interested in this and would love to help take the implementation work off your hands.

comment:4 by zky, 3 weeks ago

Owner: set to zky
Status: newassigned
Triage Stage: UnreviewedAccepted

comment:5 by zky, 3 weeks ago

Hi Johannes,

Regarding the sorting implementation, while @dataclass(order=True) is indeed stupidly simple, I noticed two potential issues.

First, it makes the sorting logic strictly dependent on the physical order of the class attributes, which could cause regressions if someone accidentally reorders them in the future.

More importantly, since run_after is typed as datetime | None, using order=True will crash the queue with a TypeError (comparing NoneType and datetime) whenever a task scheduled to run immediately (run_after=None) is compared against a scheduled task.

To prevent the queue from crashing and to make the codebase more defensive, would it be safer to explicitly implement lt and eq? We can safely handle the None fallback inside lt by comparing tuples like (self.run_after is not None, self.run_after).

Let me know your thoughts!

comment:6 by Johannes Maron, 2 weeks ago

Hi there,

Good call. The dataclasses factory utilities are mainly supposed to reduce boilerplate. They just create classes with the very same methods for you. If it gets in the way, as it seems to do here, it is an excellent choice to do things “manually.”

Idioms and tools should never get in your way. That sentiment speaks much to the core of Django and its success. In other words, I believe you are on the right path :)

Cheers!
Joe

in reply to:  6 comment:7 by zky, 2 weeks ago

Replying to Johannes Maron:

Hi there,

Good call. The dataclasses factory utilities are mainly supposed to reduce boilerplate. They just create classes with the very same methods for you. If it gets in the way, as it seems to do here, it is an excellent choice to do things “manually.”

Idioms and tools should never get in your way. That sentiment speaks much to the core of Django and its success. In other words, I believe you are on the right path :)

Cheers!
Joe

Thank you for the encouraging words! I completely agree—pragmatism over strict adherence to tools is exactly what makes Django's design philosophy so great to work with. I've already submitted the PR:https://github.com/django/django/pull/21383

comment:9 by Johannes Maron, 2 weeks ago

Needs documentation: set
Patch needs improvement: set

comment:10 by zky, 2 weeks ago

Patch needs improvement: unset

comment:11 by Johannes Maron, 2 weeks ago

Needs tests: set
Patch needs improvement: set

comment:12 by zky, 2 weeks ago

Patch needs improvement: unset

comment:13 by Johannes Maron, 13 days ago

Needs tests: unset
Patch needs improvement: set

comment:14 by Johannes Maron, 9 days ago

Needs documentation: unset

comment:15 by Johannes Maron, 8 days ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:16 by Johannes Maron, 8 days ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

comment:17 by zky, 7 days ago

Patch needs improvement: unset

comment:18 by Johannes Maron, 5 days ago

Triage Stage: AcceptedReady for checkin
Note: See TracTickets for help on using tickets.
Back to Top