Opened 6 weeks ago

Closed 5 weeks ago

Last modified 4 days ago

#36769 closed Cleanup/optimization (fixed)

Limit recursive extraction of field values in XML deserializer

Reported by: Jacob Walls Owned by: Pravin
Component: Core (Serialization) Version: dev
Severity: Normal Keywords: xml deserializer
Cc: Pravin, Shai Berger Triage Stage: Ready for checkin
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

While investigating CVE-2025-64460 (mitigated in 50efb718b31333051bc2dcb06911b8fa1358c98c), we noticed that the private helper getInnerText supports extracting arbitrarily nested text, however its only use in Django is to extract text at most one level deep, under a <natural> tag, like this fixture example.

We opted not to change this semantic in a patch release, but it occurred to me that we could only extract text at the exact expected depth (e.g. 0 if under <field> and 1 if under <field><natural>) and completely sidestep potential performance issues from invalid input making use of nested elements, see recent python CVE-2025-12084 we also mitigated yesterday.

I would appreciate any arguments I might be missing, for example, if there are legitimate use cases for ingesting text from nested tags e.g. from unescaped markup that this proposal would break.

Change History (13)

comment:1 by Pravin, 6 weeks ago

Cc: Pravin added

comment:2 by Natalia Bidart, 6 weeks ago

Cc: Shai Berger added
Keywords: xml deserializer added
Triage Stage: UnreviewedAccepted

Thank you!

comment:3 by Pravin, 6 weeks ago

Either we could change the helper function to only accept the types that Django actually uses: regular text and the one-level-deep <natural> tag. This would stop the recursion immediately for anything else, or we could explicitly tell the function how deep it should go.

comment:4 by Pravin, 6 weeks ago

Owner: set to Pravin
Status: newassigned

comment:5 by Jacob Walls, 6 weeks ago

Needs tests: set

comment:6 by Jacob Walls, 6 weeks ago

Patch needs improvement: set

comment:7 by Jacob Walls, 5 weeks ago

Needs tests: unset

comment:8 by Jacob Walls, 5 weeks ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:9 by Jacob Walls, 5 weeks ago

One advantage of merging this cleanup is that it hastens the day we can remove the workarounds and the test from 1dbd07a608e495a0c229edaaf84d58d8976313b5, which we merged knowing might be flaky (and indeed has been).

I'm subscribed to the Python 3.12 backport PR with the mitigation for the minidom issue. Waiting for that to land and for a 3.12 patch release after.

comment:10 by Jacob Walls <jacobtylerwalls@…>, 5 weeks ago

Resolution: fixed
Status: assignedclosed

In dae08cf5:

Fixed #36769 -- Avoided visiting deeply nested nodes in XML deserializer.

Only children at one level of depth need to be visited.

Co-authored-by: Jacob Walls <jacobtylerwalls@…>

comment:11 by Jacob Walls <jacobtylerwalls@…>, 4 days ago

In 1a70889:

Refs #36769 -- Corrected invalid XML fixtures.

fixture9.xml was likely wrong since its introduction in
35cc439228cd32dfa7a3ec919db01a8a5cd17d33.

The relevant part of the Visa model is:

class Visa(models.Model):

person = models.ForeignKey(Person, models.CASCADE)

The Visa.person <field>s needed to be declared as relations, and
the Person <field>s didn't need their values wrapped in <natural>,
since they weren't relations.

comment:12 by Jacob Walls <jacobtylerwalls@…>, 4 days ago

In a25158f5:

Refs #36769 -- Avoided visiting grandchild nodes in XML Deserializer.

The only use case for visiting grandchild nodes turned out to be to
support an unintentionally invalid fixture in the test suite.

The invalid fixture added in #36969 was modeled on fixture9.xml in
dae08cf55b83caef5e8ee39b16417692e8565278, so that is corrected as well
in this commit, where the test will still pass.

comment:13 by Jacob Walls <jacobtylerwalls@…>, 4 days ago

In 73c5e94:

Refs #36769 -- Raised SuspiciousOperation for unexpected nested tags in XML Deserializer.

Thanks Shai Berger and Natalia Bidart for reviews.

Note: See TracTickets for help on using tickets.
Back to Top