Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#20242 closed Bug (fixed)

Django queryset 'in' operator fails on first call

Reported by: anonymous Owned by: svisser
Component: Database layer (models, ORM) Version: 1.5
Severity: Normal Keywords:
Cc: bmispelon@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When using the 'in' operator on a queryset, the first time the call is made it fails. If 'prefetch_related' is removed and 'all' is used instead then the problem is gone.

from django.db import models


class Category(models.Model):
    name = models.CharField(max_length=100)


class Project(models.Model):
    categories = models.ManyToManyField(Category, related_name='projects')    


category_list = Category.objects.prefetch_related('projects')


print category_list # [<Category: Category object>, <Category: Category object>]
print category_list[0] in category_list # False
print category_list[0] in category_list # True

Change History (6)

comment:1 Changed 2 years ago by bmispelon

  • Cc bmispelon@… added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Thanks for the detailed report.

I can reproduce the issue and I'll add that as you implied, it seems to be related to prefetch_related since the problem does not appear when you use all() instead of prefetch_related('projects').

comment:2 Changed 2 years ago by svisser

  • Has patch set

The reason it happens is that, when self._result_cache of a QuerySet is None, the __contains__ method of a Queryset executes the part it = iter(self).

The remaining code in __contains__ then assumes that self._result_cache is equal to the empty list. But that assumption no longer holds because the prefetching (due to the len(self) call in __iter__) causes the result cache to be populated. Hence, self._iter is None and self._result_cache contains values but the remaining logic in __contains__ does not take that into account. This means it'll return False because self._iter is indeed None but there are still unchecked items in self._result_cache.

This can be fixed by only checking that self._iter is None when len(self._result_cache) <= pos holds. If that doesn't hold, it means that there are still unchecked items in self._result_cache and we need to check those first before giving up.

Pull request: https://github.com/django/django/pull/1060

In fairness, I have added myself to AUTHORS as well in this commit as it took me a while to familiarize myself with the code.

comment:3 Changed 2 years ago by svisser

  • Owner changed from nobody to svisser
  • Status changed from new to assigned

comment:4 Changed 2 years ago by timo

It looks like this has been fixed with the removal of chunked reads from QuerySet iteration (the __contains__ method no longer exists) [70679243d1786e03557c28929f9762a119e3ac14]. I'll commit the regression test though.

comment:5 Changed 2 years ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In a2967d5204729771e3716c431ea98f8ee8562d3e:

Fixed #20242 - Added a regression test for prefetch_related.

Issue was fixed by removal of chunked reads from
QuerySet iteration in 70679243d1786e03557c28929f9762a119e3ac14.

Thanks Simeon Visser for the patch.

comment:6 Changed 2 years ago by svisser

Thanks Tim.

Note: See TracTickets for help on using tickets.
Back to Top