Opened 7 years ago

Closed 6 years ago

Last modified 6 years ago

#28312 closed Bug (fixed)

ModelChoiceIterator uses cached length of queryset.

Reported by: Sjoerd Job Postmus Owned by: François Freitag
Component: Forms Version: 1.8
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In Django 1.8 (but also master), ModelChoiceIterator has the following implementation:

class ModelChoiceIterator:
    def __init__(self, field):
        self.field = field
        self.queryset = field.queryset

    def __iter__(self):
        if self.field.empty_label is not None:
            yield ("", self.field.empty_label)
        queryset = self.queryset.all()
        # Can't use iterator() when queryset uses prefetch_related()
        if not queryset._prefetch_related_lookups:
            queryset = queryset.iterator()
        for obj in queryset:
            yield self.choice(obj)

    def __len__(self):
        return (len(self.queryset) + (1 if self.field.empty_label is not None else 0))

    def choice(self, obj):
        return (self.field.prepare_value(obj), self.field.label_from_instance(obj))

As can be seen, __iter__ actually does a .all() to make sure it gets fresh results. However, __len__ does not. Also: it currently caches all objects inside self.queryset, only releasing them again on shutdown which might be problematic if there are a lot of results (or even a few "large" results).

Suggested implementation

def __len__(self):
    return self.queryset.count() + (1 if self.field.empty_label is not None else 0))

Change History (11)

comment:1 by Tim Graham, 7 years ago

Component: UncategorizedForms

I don't understand the "bug" aspect to the report. Can you give a test case that demonstrates incorrect behavior?

I understand the point about memory consumption. I guess switching to count() might be better, although it adds a query and could require some adapting of assertNumQueries() tests (there is one test in Django's model_forms tests that breaks in this way).

comment:2 by Sjoerd Job Postmus, 7 years ago

The following two tests that show how I would expect the ModelMultipleChoiceField to work. The first tests __iter__ (and works), the second tests __len__ (and fails: AssertionError: 3 != 4).

diff --git a/tests/model_forms/tests.py b/tests/model_forms/tests.py
index c85eb2a..3c6241d 100644
--- a/tests/model_forms/tests.py
+++ b/tests/model_forms/tests.py
@@ -1992,6 +1992,27 @@ class ModelMultipleChoiceFieldTests(TestCase):
         form = ArticleCategoriesForm(instance=article)
         self.assertCountEqual(form['categories'].value(), [self.c2.slug, self.c3.slug])

+    def test_added_fields_gets_rendered(self):
+        f = forms.ModelMultipleChoiceField(Category.objects.all())
+        self.assertEqual(list(f.choices), [
+            (self.c1.pk, 'Entertainment'),
+            (self.c2.pk, "It's a test"),
+            (self.c3.pk, 'Third')])
+        c4 = Category.objects.create(
+            name="Now you see me", slug="now", url="added")
+        self.assertEqual(list(f.choices), [
+            (self.c1.pk, 'Entertainment'),
+            (self.c2.pk, "It's a test"),
+            (self.c3.pk, 'Third'),
+            (c4.pk, 'Now you see me')])
+
+    def test_added_fields_gets_counted(self):
+        f = forms.ModelMultipleChoiceField(Category.objects.all())
+        self.assertEqual(len(f.choices), 3)
+        c4 = Category.objects.create(
+            name="Now you see me", slug="now", url="added")
+        self.assertEqual(len(f.choices), 4)
+

 class ModelOneToOneFieldTests(TestCase):
     def test_modelform_onetoonefield(self):

This means that the "invariant" that len(f.choices) == len(list(f.choices)) does not hold.

In particular: note that (currently) the value is calculated just once: the first time __len__ is called. This caching is not limited to 'per request', but is cached for the lifetime of the process handling requests.

comment:3 by Sjoerd Job Postmus, 7 years ago

Perhaps this can be closed, as (our specific case) is resolved by #27563. My apologies.

Extra context:

We ran into this problem as our form contained logic as follows:

if len(self.fields['status'].choices) <= 1:
    self.fields.pop('status')

I tested this: count = 1: no field. Add a Status, reload page: still no field. Turns out the count was being "cached". Since it is a Django 1.8 project, above fix is not taken into account, and the cached value did indeed live longer than the current request. Looking at the changes, I expect the behaviour to be correct in Django 2.0.

Even so: the ModelChoiceIterator uses .all() or .iterator() in the __iter__ method, and thus forces obtaining a fresh value. But __len__ does not do so, and that is an inconsistency that might still qualify as a bug.

Are there any plans to backport #27563 to Django 1.8? I expect not, as it's past the "End of mainstream support".

comment:4 by Tim Graham, 7 years ago

Has patch: set
Triage Stage: UnreviewedAccepted

Yes, I think the current behavior of __len__() seems inappropriate. PR

You are correct that 1.8 only receives security/data loss fixes at this time. This fix does not qualify for a backport to any stable branch based on our supported versions policy.

comment:5 by Tim Graham, 7 years ago

Patch needs improvement: set

I've closed my PR, someone else is welcome to continue this.

comment:6 by Srinivas Reddy Thatiparthy, 7 years ago

Owner: changed from nobody to Srinivas Reddy Thatiparthy
Status: newassigned

comment:7 by François Freitag, 6 years ago

Owner: changed from Srinivas Reddy Thatiparthy to François Freitag
Patch needs improvement: unset
Version 0, edited 6 years ago by François Freitag (next)

comment:8 by Carlton Gibson, 6 years ago

Has patch: unset

Related discussion on mailing list

comment:9 by François Freitag, 6 years ago

Has patch: set

comment:10 by Tim Graham <timograham@…>, 6 years ago

Resolution: fixed
Status: assignedclosed

In 3fca95e:

Fixed #28312 -- Made ModelChoiceIterator.len() more memory-efficient.

Instead of loading all QuerySet results in memory, count the number of
entries. This adds an extra query when list() or tuple() is called on the
choices (because both call len() then iter()) but uses less
memory since the QuerySet results won't be cached. In most cases, the
choices will only be iterated on, meaning that len() won't be called
and only one query will be executed.

comment:11 by Tim Graham <timograham@…>, 6 years ago

In d1413c5:

Refs #28312 -- Added an optimized bool() to ModelChoiceIterator.

COUNT is more expensive than EXISTS; use the latter when possible.

Note: See TracTickets for help on using tickets.
Back to Top