Opened 4 years ago

Last modified 2 years ago

#32244 closed Cleanup/optimization

ORM inefficiency: ModelFormSet executes a single-object SELECT query per formset instance when saving/validating — at Version 2

Reported by: Lushen Wu Owned by: nobody
Component: Database layer (models, ORM) Version: 3.1
Severity: Normal Keywords: formsets
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Lushen Wu)

Conceptual summary of the issue:
Let's say we have a Django app with Author and Book models, and use a BookFormSet to add / modify / delete books that belong to a given Author. The problem is when the BookFormSet is validated, ModelChoiceField.to_python() ends up calling self.queryset.get(id=123) which results in a single-object SELECT query for each book in the formset. That means if I want to update 15 books, Django performs 15 separate SELECT queries, which seems incredibly inefficient. (Our actual app is an editor that can update any number of objects in a single formset, e.g. 50+).

My failed attempts to solve this:

  1. First I tried passing a queryset to the BookFormSet, i.e. formset = BookFormSet(data=request.POST, queryset=Book.objects.filter(author=1)), but the ModelChoiceField still does its single-object SELECT queries.
  2. Then I tried to see where the ModelChoiceField defines its queryset, which seems to be in BaseModelFormSet.add_fields(). I tried initiating the ModelChoiceField with the same queryset that I passed to the formset, e.g. Book.objects.filter(author=1) instead of the original code which would be Book._default_manager.get_queryset(). But this doesn't help because I guess the new queryset I defined isn't actually linked to what was passed to the formset (and we don't have a cache running). So the multiple SELECT queries still happen. (Note: I realize _default_manager.get_queryset() might be necessary in cases where the formset can be used to switch one Model instance to another instance which might not be in the original queryset passed to the BaseModelFormset, but this is not our use case)
  3. I noticed that BaseFormSet._existing_object() actually provides a way to check whether an object exists in the queryset that was giving to the formset constructor, which means that queryset is evaluated at most once and the results stored in BaseFormSet._object_dict. I thought there might be some way to have ModelChoiceField.to_python() do something similar before calling self.queryset.get(id=123), but I don't think ModelChoiceField is aware of BaseFormSet, and it would seem an anti-pattern to reach up the hierarchy like this.

The easiest solution seems to me to pass BaseFormSet._object_dict in some way to each ModelForm that's created, and then allow the ModelChoiceField to check this _object_dict before making another SELECT query.

Change History (2)

comment:1 by Lushen Wu, 4 years ago

Description: modified (diff)

comment:2 by Lushen Wu, 4 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top