Opened 2 months ago

Closed 3 weeks ago

Last modified 3 weeks ago

#36913 closed Cleanup/optimization (fixed)

Optimise ChoiceField / MultipleChoiceField handling of duplicate submissions

Reported by: Jake Howard Owned by: Afenomamy
Component: Forms Version: 6.0
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Jake Howard)

When a ChoiceField / MultipleChoiceField has 5 possible choices, but the form submits 25 values, the choices values are compared once per submitted value. If the submitted values are duplicates, the validation doesn't terminate early, but can still spend a lot of time unnecessarily validating values. This can be very slow when large (~30k) numbers of values are submitted.

A suggested fix is to only validate the unique submitted values (for example for val in set(value)).

This issue was reported to the Security Team, but deemed not a security issue due to the minimal impact when given reasonable input (in the bounds of the security policy).

Change History (13)

comment:1 by Jake Howard, 2 months ago

Note that validation that submissions (and choices) are unique is being handled in a separate feature request: https://github.com/django/new-features/issues/121

comment:2 by Jacob Walls, 2 months ago

Triage Stage: UnreviewedAccepted

comment:3 by Abhimanyu Singh Negi, 2 months ago

I’d like to work on this issue.

From what I understand the slowdown happens because duplicate submitted values are validated again and again. I’m thinking of validating only the unique values internally while keeping the original list unchanged so behaviour stays the same. I’ll add some tests too to make sure duplicates and invalid values are still handled correctly.

If this sounds reasonable and no one is working on this, i would like to assign the issue to myself.

Last edited 2 months ago by Abhimanyu Singh Negi (previous) (diff)

comment:4 by Abhimanyu Singh Negi, 2 months ago

Last edited 2 months ago by Abhimanyu Singh Negi (previous) (diff)

in reply to:  3 comment:5 by Natalia Bidart, 2 months ago

Replying to Abhimanyu Singh Negi:

I’d like to work on this issue.

From what I understand the slowdown happens because duplicate submitted values are validated again and again. I’m thinking of validating only the unique values internally while keeping the original list unchanged so behaviour stays the same. I’ll add some tests too to make sure duplicates and invalid values are still handled correctly.

If this sounds reasonable and no one is working on this, i would like to assign the issue to myself.

This ticket has been assigned already and has been reserved to evaluate under the Djangonaut Space umbrella. Sorry!

comment:6 by Afenomamy, 5 weeks ago

Hello, I am Aina (Team Saturn) from Djangonaut Space. I will be looking at this ticket

comment:7 by Afenomamy, 5 weeks ago

Owner: changed from Sarah Boyce to Afenomamy

comment:8 by Afenomamy, 3 weeks ago

Has patch: set

comment:9 by Afenomamy, 3 weeks ago

Description: modified (diff)

comment:10 by Jake Howard, 3 weeks ago

Description: modified (diff)

comment:11 by Jake Howard, 3 weeks ago

Triage Stage: AcceptedReady for checkin

comment:12 by Jacob Walls <jacobtylerwalls@…>, 3 weeks ago

Resolution: fixed
Status: assignedclosed

In 26f8929:

Fixed #36913 -- Optimized MultipleChoiceField.validate().

comment:13 by Jacob Walls <jacobtylerwalls@…>, 3 weeks ago

In 8b7ea2bc:

Refs #36913 -- Maintained error message determinism in MultipleChoiceField.validate().

Used Django's OrderedSet datastructure instead of set() in MultipleChoiceField.validate()
to prevent submission ordering from being discarded during validation.

Thanks to Jacob Walls, JaeHyuck Sa, Jake Howard and Simon Charette for
the reviews.

Note: See TracTickets for help on using tickets.
Back to Top