Opened 3 weeks ago
Last modified 5 days ago
#37126 assigned New feature
Make Task and TaskResult comparable
| Reported by: | Johannes Maron | Owned by: | zky |
|---|---|---|---|
| Component: | Tasks | Version: | dev |
| Severity: | Normal | Keywords: | |
| Cc: | Johannes Maron | Triage Stage: | Ready for checkin |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | yes | UI/UX: | no |
Description
Python's queue.PriorityQueue implementation requires objects to be comparable.
Since the base implementation of a Task implements a priority, it only makes sense to provide basic comparability based on the priority and date.
Dataclasses have a neat order=True attribute to make this stupidly simple.
Change History (18)
comment:1 by , 3 weeks ago
comment:2 by , 3 weeks ago
Missing benchmark sources for my previous comment:
import timeit
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
from django.utils.json import normalize_json
from django.tasks.base import Task, TaskResult, TaskResultStatus
# --- Real Task instance ---
def my_func():
pass
real_task = Task.__new__(Task)
object.__setattr__(real_task, "func", my_func)
object.__setattr__(real_task, "priority", 0)
object.__setattr__(real_task, "backend", "default")
object.__setattr__(real_task, "queue_name", "default")
object.__setattr__(real_task, "run_after", None)
object.__setattr__(real_task, "takes_context", False)
# --- Shared kwargs ---
now = datetime.now()
common_kwargs = dict(
task=real_task,
id="abc123",
status=TaskResultStatus.SUCCESSFUL,
enqueued_at=now,
started_at=now,
finished_at=now,
last_attempted_at=now,
args=[],
kwargs={},
backend="default",
errors=[],
worker_ids=["worker-1"],
)
# --- id-only equality variant ---
@dataclass(frozen=True, slots=True, kw_only=True)
class TaskResultIdEq:
task: Any = field(compare=False)
id: str
status: Any = field(compare=False)
enqueued_at: datetime | None = field(compare=False)
started_at: datetime | None = field(compare=False)
finished_at: datetime | None = field(compare=False)
last_attempted_at: datetime | None = field(compare=False)
args: list[Any] = field(compare=False)
kwargs: dict[str, Any] = field(compare=False)
backend: str = field(compare=False)
errors: list = field(compare=False)
worker_ids: list[str] = field(compare=False)
_return_value: Any | None = field(init=False, default=None, compare=False)
def __post_init__(self):
object.__setattr__(self, "args", normalize_json(self.args))
object.__setattr__(self, "kwargs", normalize_json(self.kwargs))
# --- Instances to compare ---
a_full = TaskResult(**common_kwargs)
b_full = TaskResult(**common_kwargs) # equal
c_full = TaskResult(**{**common_kwargs, "id": "other"}) # not equal
a_id = TaskResultIdEq(**common_kwargs)
b_id = TaskResultIdEq(**common_kwargs) # equal
c_id = TaskResultIdEq(**{**common_kwargs, "id": "other"}) # not equal
N = 100_000_000
results = {
"Original == (equal)": timeit.timeit(lambda: a_full == b_full, number=N),
"Original == (not equal)": timeit.timeit(lambda: a_full == c_full, number=N),
"Id-only == (equal)": timeit.timeit(lambda: a_id == b_id, number=N),
"Id-only == (not equal)": timeit.timeit(lambda: a_id == c_id, number=N),
}
for label, t in results.items():
print(f"{label:<30} {t:.3f}s ({t/N*1e9:.1f} ns/call)")
comment:3 by , 3 weeks ago
Hi Johannes, thanks for providing such clear benchmarks! The performance gains for the task queue are very significant. I'm quite interested in this and would love to help take the implementation work off your hands.
comment:4 by , 3 weeks ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
| Triage Stage: | Unreviewed → Accepted |
comment:5 by , 3 weeks ago
Hi Johannes,
Regarding the sorting implementation, while @dataclass(order=True) is indeed stupidly simple, I noticed two potential issues.
First, it makes the sorting logic strictly dependent on the physical order of the class attributes, which could cause regressions if someone accidentally reorders them in the future.
More importantly, since run_after is typed as datetime | None, using order=True will crash the queue with a TypeError (comparing NoneType and datetime) whenever a task scheduled to run immediately (run_after=None) is compared against a scheduled task.
To prevent the queue from crashing and to make the codebase more defensive, would it be safer to explicitly implement lt and eq? We can safely handle the None fallback inside lt by comparing tuples like (self.run_after is not None, self.run_after).
Let me know your thoughts!
follow-up: 7 comment:6 by , 2 weeks ago
Hi there,
Good call. The dataclasses factory utilities are mainly supposed to reduce boilerplate. They just create classes with the very same methods for you. If it gets in the way, as it seems to do here, it is an excellent choice to do things “manually.”
Idioms and tools should never get in your way. That sentiment speaks much to the core of Django and its success. In other words, I believe you are on the right path :)
Cheers!
Joe
comment:7 by , 2 weeks ago
Replying to Johannes Maron:
Hi there,
Good call. The dataclasses factory utilities are mainly supposed to reduce boilerplate. They just create classes with the very same methods for you. If it gets in the way, as it seems to do here, it is an excellent choice to do things “manually.”
Idioms and tools should never get in your way. That sentiment speaks much to the core of Django and its success. In other words, I believe you are on the right path :)
Cheers!
Joe
Thank you for the encouraging words! I completely agree—pragmatism over strict adherence to tools is exactly what makes Django's design philosophy so great to work with. I've already submitted the PR:https://github.com/django/django/pull/21383
comment:9 by , 2 weeks ago
| Needs documentation: | set |
|---|---|
| Patch needs improvement: | set |
comment:10 by , 2 weeks ago
| Patch needs improvement: | unset |
|---|
comment:11 by , 2 weeks ago
| Needs tests: | set |
|---|---|
| Patch needs improvement: | set |
comment:12 by , 2 weeks ago
| Patch needs improvement: | unset |
|---|
comment:13 by , 13 days ago
| Needs tests: | unset |
|---|---|
| Patch needs improvement: | set |
comment:14 by , 9 days ago
| Needs documentation: | unset |
|---|
comment:15 by , 8 days ago
| Patch needs improvement: | unset |
|---|---|
| Triage Stage: | Accepted → Ready for checkin |
comment:16 by , 8 days ago
| Patch needs improvement: | set |
|---|---|
| Triage Stage: | Ready for checkin → Accepted |
comment:17 by , 7 days ago
| Patch needs improvement: | unset |
|---|
comment:18 by , 5 days ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
I would also suggest adding a custom eq method. We only need to include
idand a simple string comparison is much quicker. Benchmarks N = 100_000_000: