Opened 7 weeks ago

Last modified 3 weeks ago

#36919 assigned Cleanup/optimization

Allow `TaskResult` (and `Task`) to be pickled

Reported by: Jake Howard Owned by: Varun Kasyap Pentamaraju
Component: Tasks Version: 6.0
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

As part of implementing some task backends, it can be useful to pickle a TaskResult to pass it around implementation.

Since it's a dataclass most of the implementation is already there - the main issue is that Task itself can't be pickled as it references a function. Replacing that with a string reference during pickling will likely resolve the current issues.

The implementation isn't especially complex, so I'm not opposed to this being closed and left to backend implementers to deal with instead.

Change History (11)

comment:1 by Vidhi Singh, 7 weeks ago

I’d like to investigate this further.

I’ll start by reproducing the pickling failure for Task and TaskResult on the current main branch to better understand the exact limitations.

Before exploring an implementation, I’d appreciate clarification on whether this is something Django core should support directly, or if the expectation is that task backends handle serialization themselves.

I’ll report back with findings shortly.

comment:2 by Natalia Bidart, 6 weeks ago

Triage Stage: UnreviewedAccepted

Thank you Jake, I think the request makes sense. I think we should borrow/reuse from https://github.com/django/django/blob/main/django/db/migrations/serializer.py (see Serializer class).

comment:3 by Jake Howard, 6 weeks ago

I'm not sure I understand where DB serialization comes into this? I don't think Task or TaskResult will ever need to be serialized in a migration (and it's probably an antipattern if they do). It did remind me of django.core.serializers, but that seems focused on the ORM, which I also don't think is relevant.

I have a working implementation which could do with some polishing. If someone doesn't beat me to getting a PR up, I'll push something up soon (intentionally leaving unassigned).

comment:4 by Varun Kasyap Pentamaraju, 6 weeks ago

Owner: set to Varun Kasyap Pentamaraju
Status: newassigned

want to contribute

in reply to:  3 ; comment:5 by Varun Kasyap Pentamaraju, 6 weeks ago

Replying to Jake Howard:

Hi Jake, I guess Natalia is just suggesting reusing (or taking inspiration from) some logic inside Serializer, which already converts a function to its dotted import path and includes validation for non-importable cases.

I'm not sure I understand where DB serialization comes into this? I don't think Task or TaskResult will ever need to be serialized in a migration (and it's probably an antipattern if they do). It did remind me of django.core.serializers, but that seems focused on the ORM, which I also don't think is relevant.

in reply to:  5 comment:6 by Natalia Bidart, 6 weeks ago

Replying to Varun Kasyap Pentamaraju:

Replying to Jake Howard:

Hi Jake, I guess Natalia is just suggesting reusing (or taking inspiration from) some logic inside Serializer, which already converts a function to its dotted import path and includes validation for non-importable cases.

Yes, thank you! That was my point, we already have robust logic for serializing methods and functions, so we shouldn't (ideally, when possible) re-invent the wheel.

comment:7 by Varun Kasyap Pentamaraju, 6 weeks ago

Has patch: set

comment:8 by matiasb, 3 weeks ago

Hi!

Somehow related to this. While exploring an implementation of a Celery backend for Django’s task framework, I ran into a serialization issue related to takes_context=True. When set, the task callable receives a TaskContext instance containing execution metadata, including the current TaskResult.

Celery serializes task arguments before sending them to workers, and the default serializer is JSON. TaskContext (and specifically the embedded TaskResult) is not JSON serializable, which in my case leads to errors like:

kombu.exceptions.EncodeError: Object of type TaskContext is not JSON serializable

It would be great if it could be possible not only make this pickable but also JSON-serializable?

comment:9 by Jake Howard, 3 weeks ago

It would be great if it could be possible not only make this pickable but also JSON-serializable?

JSON serialization of custom objects requires a custom encoder, rather than changes to the object directly. This should be doable as part of your library rather than Django itself (since it wouldn't be clear how to then decode the TaskContext (or other django.task types) without some side-channel metadata stating what the data is.

comment:10 by matiasb, 3 weeks ago

When you say custom objects, are you referring to TaskContext? As far as I understand, TaskContext is provided by the tasks framework and it is not customizable, right?

The only piece that seems outside the control of the tasks framework is TaskResult, which can contain an arbitrary task return value. However, according to the docs (https://docs.djangoproject.com/en/6.0/topics/tasks/#enqueueing-tasks), "Because both Task arguments and return values are serialized to JSON, they must be JSON-serializable". That’s what led me to wonder about this.

In any case, I see your point and it makes sense. Still, in my particular use case (and I guess it may also help some other possible backends for which context needs to be passed to the worker through some channel?) it could simplify several things if the context were JSON-serializable :-)

comment:11 by Jake Howard, 3 weeks ago

When you say custom objects, are you referring to TaskContext?

No. If you want to make any object JSON serializable, it needs a custom json.Encoder. Unlike pickling, JSON encodability isn't a property of the object itself. For example, for Django to have custom implemented support for serializing datetime, it's done using a custom encoder, rather than modifying the object itself. The concept you're after of "making an object JSON serializable" simply doesn't exist.

Since JSON and pickle support are entirely different, implementing any kind of support for it should be its own ticket anyway (not that it's possible anyway).

Note: See TracTickets for help on using tickets.
Back to Top