Opened 13 months ago

Closed 3 months ago

Last modified 3 months ago

#28312 closed Bug (fixed)

ModelChoiceIterator uses cached length of queryset.

Reported by: Sjoerd Job Postmus Owned by: François Freitag
Component: Forms Version: 1.8
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In Django 1.8 (but also master), ModelChoiceIterator has the following implementation:

class ModelChoiceIterator:
    def __init__(self, field):
        self.field = field
        self.queryset = field.queryset

    def __iter__(self):
        if self.field.empty_label is not None:
            yield ("", self.field.empty_label)
        queryset = self.queryset.all()
        # Can't use iterator() when queryset uses prefetch_related()
        if not queryset._prefetch_related_lookups:
            queryset = queryset.iterator()
        for obj in queryset:
            yield self.choice(obj)

    def __len__(self):
        return (len(self.queryset) + (1 if self.field.empty_label is not None else 0))

    def choice(self, obj):
        return (self.field.prepare_value(obj), self.field.label_from_instance(obj))

As can be seen, __iter__ actually does a .all() to make sure it gets fresh results. However, __len__ does not. Also: it currently caches all objects inside self.queryset, only releasing them again on shutdown which might be problematic if there are a lot of results (or even a few "large" results).

Suggested implementation

def __len__(self):
    return self.queryset.count() + (1 if self.field.empty_label is not None else 0))

Change History (11)

comment:1 Changed 13 months ago by Tim Graham

Component: UncategorizedForms

I don't understand the "bug" aspect to the report. Can you give a test case that demonstrates incorrect behavior?

I understand the point about memory consumption. I guess switching to count() might be better, although it adds a query and could require some adapting of assertNumQueries() tests (there is one test in Django's model_forms tests that breaks in this way).

comment:2 Changed 13 months ago by Sjoerd Job Postmus

The following two tests that show how I would expect the ModelMultipleChoiceField to work. The first tests __iter__ (and works), the second tests __len__ (and fails: AssertionError: 3 != 4).

diff --git a/tests/model_forms/tests.py b/tests/model_forms/tests.py
index c85eb2a..3c6241d 100644
--- a/tests/model_forms/tests.py
+++ b/tests/model_forms/tests.py
@@ -1992,6 +1992,27 @@ class ModelMultipleChoiceFieldTests(TestCase):
         form = ArticleCategoriesForm(instance=article)
         self.assertCountEqual(form['categories'].value(), [self.c2.slug, self.c3.slug])

+    def test_added_fields_gets_rendered(self):
+        f = forms.ModelMultipleChoiceField(Category.objects.all())
+        self.assertEqual(list(f.choices), [
+            (self.c1.pk, 'Entertainment'),
+            (self.c2.pk, "It's a test"),
+            (self.c3.pk, 'Third')])
+        c4 = Category.objects.create(
+            name="Now you see me", slug="now", url="added")
+        self.assertEqual(list(f.choices), [
+            (self.c1.pk, 'Entertainment'),
+            (self.c2.pk, "It's a test"),
+            (self.c3.pk, 'Third'),
+            (c4.pk, 'Now you see me')])
+
+    def test_added_fields_gets_counted(self):
+        f = forms.ModelMultipleChoiceField(Category.objects.all())
+        self.assertEqual(len(f.choices), 3)
+        c4 = Category.objects.create(
+            name="Now you see me", slug="now", url="added")
+        self.assertEqual(len(f.choices), 4)
+

 class ModelOneToOneFieldTests(TestCase):
     def test_modelform_onetoonefield(self):

This means that the "invariant" that len(f.choices) == len(list(f.choices)) does not hold.

In particular: note that (currently) the value is calculated just once: the first time __len__ is called. This caching is not limited to 'per request', but is cached for the lifetime of the process handling requests.

comment:3 Changed 13 months ago by Sjoerd Job Postmus

Perhaps this can be closed, as (our specific case) is resolved by #27563. My apologies.

Extra context:

We ran into this problem as our form contained logic as follows:

if len(self.fields['status'].choices) <= 1:
    self.fields.pop('status')

I tested this: count = 1: no field. Add a Status, reload page: still no field. Turns out the count was being "cached". Since it is a Django 1.8 project, above fix is not taken into account, and the cached value did indeed live longer than the current request. Looking at the changes, I expect the behaviour to be correct in Django 2.0.

Even so: the ModelChoiceIterator uses .all() or .iterator() in the __iter__ method, and thus forces obtaining a fresh value. But __len__ does not do so, and that is an inconsistency that might still qualify as a bug.

Are there any plans to backport #27563 to Django 1.8? I expect not, as it's past the "End of mainstream support".

comment:4 Changed 13 months ago by Tim Graham

Has patch: set
Triage Stage: UnreviewedAccepted

Yes, I think the current behavior of __len__() seems inappropriate. PR

You are correct that 1.8 only receives security/data loss fixes at this time. This fix does not qualify for a backport to any stable branch based on our supported versions policy.

comment:5 Changed 13 months ago by Tim Graham

Patch needs improvement: set

I've closed my PR, someone else is welcome to continue this.

comment:6 Changed 11 months ago by Srinivas Reddy Thatiparthy

Owner: changed from nobody to Srinivas Reddy Thatiparthy
Status: newassigned

comment:7 Changed 3 months ago by François Freitag

Owner: changed from Srinivas Reddy Thatiparthy to François Freitag
Patch needs improvement: unset
Last edited 3 months ago by François Freitag (previous) (diff)

comment:8 Changed 3 months ago by Carlton Gibson

Has patch: unset

Related discussion on mailing list

comment:9 Changed 3 months ago by François Freitag

Has patch: set

comment:10 Changed 3 months ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 3fca95e:

Fixed #28312 -- Made ModelChoiceIterator.len() more memory-efficient.

Instead of loading all QuerySet results in memory, count the number of
entries. This adds an extra query when list() or tuple() is called on the
choices (because both call len() then iter()) but uses less
memory since the QuerySet results won't be cached. In most cases, the
choices will only be iterated on, meaning that len() won't be called
and only one query will be executed.

comment:11 Changed 3 months ago by Tim Graham <timograham@…>

In d1413c5:

Refs #28312 -- Added an optimized bool() to ModelChoiceIterator.

COUNT is more expensive than EXISTS; use the latter when possible.

Note: See TracTickets for help on using tickets.
Back to Top