Opened 10 years ago

Last modified 4 years ago

#24141 closed New feature

contains() method for QuerySets — at Version 8

Reported by: gormster Owned by:
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Johan Schiff Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Johan Schiff)

Tests for object membership in a QuerySet. Trying to alleviate some of the newbies using in to test for membership and accidentally selecting the entire table.

Discussion in django-developers:
https://groups.google.com/forum/#!msg/django-developers/NZaMq9BALrs/YC-Loq5ZAwAJ

Pull request:
https://github.com/django/django/pull/13038

Change History (9)

comment:1 by gormster, 10 years ago

Owner: changed from nobody to gormster
Status: newassigned

by gormster, 10 years ago

Attachment: queryset-contains.diff added

db/models/query.py diff

comment:2 by gormster, 10 years ago

Owner: gormster removed
Status: assignednew

comment:3 by Tim Graham, 10 years ago

Resolution: wontfix
Status: newclosed

Closing per discussion on the pull request.

comment:4 by Carl Meyer, 10 years ago

Tim, I wouldn't say the discussion on that PR settled the question of whether QuerySet.contains() is a useful feature or not.

I think the original PR (to implement QuerySet.__contains__() as .filter(pk=obj.pk).exists() under the hood) is a non-starter, but adding Queryset.contains() seems reasonable to me. Gormster is right that there are a number of questions on SO with people confused about how to efficiently check for object containment in a queryset, and that M2M relationships present a not-uncommon use case for it. My inclination would be slightly in favor of adding it, if gormster updated the patch with tests and docs. But I'm interested in what others think.

comment:5 by Tim Graham, 10 years ago

Needs documentation: set
Needs tests: set
Resolution: wontfix
Status: closednew

Oops. Sorry, Carl. I saw your comment "The main argument against contains(), I think, would just be that it's rarely if ever needed." and not the one before it "Adding this as a .contains() method to QuerySet is a reasonable feature request that's consistent with the current design philosophy".

I am not sure if this proposal will lead to better code for anyone who doesn't read the docs or not. Since it hides the underlying fact of whether or not the QuerySet is cached, it seems to me that it won't necessarily be more intuitive to write code that's more efficient. For example:

queryset = Foo.objects.all()
for obj in range objs:
    if queryset.contains(obj):
       print("Found %s" % obj)

In the new example, my understanding is we'll get len(obj) queries, where as if we use if obj in queryset, we'll get 1 query. Please correct me if I'm wrong.

comment:6 by Carl Meyer, 10 years ago

You're right, but note that this is exactly parallel to the way QuerySet.count() and QuerySet.exists() currently work - they use the cache if it is populated, otherwise they execute a query. Of course, there's no reason to ever call QuerySet.count() or QuerySet.exists() multiple times in a loop, so maybe it will be more of a trap in this case. But it's already true that if you do .count() on a queryset that you are later populating fully, you've done an unnecessary extra query.

For your demonstration case, the cutoff between when it's more efficient to do more queries vs more efficient to pull in the entire QuerySet is not a priori clear; it would depend very much on the size of the whole queryset and the number of objs you are checking for containment. Probably for many such check-containment-of-multiple-objects cases the most efficient approach will be to do a values_list query and check obj IDs for membership in that set of IDs.

It's already true and will always be true, I think, that in order to maximize query efficiency using the Django ORM, you have to understand how it works. That includes things like knowing when to use select_related and prefetch_related, and it also includes understanding the result cache and what triggers populating it. In the end I don't think adding .contains(obj) really changes much, it's just a nicer spelling for .filter(pk=obj.pk).exists(). The question is really whether good use cases for the latter are common enough to warrant adding a method. And I'm really fine with either answer to that question, just didn't want the feature request to be dismissed without good rationale.

comment:7 by Tim Graham, 10 years ago

Resolution: needsinfo
Status: newclosed

I guess a discussion on the DevelopersMailingList needs to happen in order to move this forward. Closing until that discussion happens and there's agreement to add this.

comment:8 by Johan Schiff, 4 years ago

Cc: Johan Schiff added
Description: modified (diff)
Needs tests: unset
Resolution: needsinfo
Status: closednew

Reopening, based on discussion in django-developers.

Note: See TracTickets for help on using tickets.
Back to Top