Opened 6 years ago

Last modified 6 weeks ago

#24141 assigned New feature

contains() method for QuerySets

Reported by: gormster Owned by: Johan Schiff
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: Johan Schiff Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Johan Schiff)

Tests for object membership in a QuerySet. Trying to alleviate some of the newbies using in to test for membership and accidentally selecting the entire table.

django-developers
PR

Attachments (1)

queryset-contains.diff (676 bytes) - added by gormster 6 years ago.
db/models/query.py diff

Download all attachments as: .zip

Change History (12)

comment:1 Changed 6 years ago by gormster

Owner: changed from nobody to gormster
Status: newassigned

Changed 6 years ago by gormster

Attachment: queryset-contains.diff added

db/models/query.py diff

comment:2 Changed 6 years ago by gormster

Owner: gormster deleted
Status: assignednew

comment:3 Changed 6 years ago by Tim Graham

Resolution: wontfix
Status: newclosed

Closing per discussion on the pull request.

comment:4 Changed 6 years ago by Carl Meyer

Tim, I wouldn't say the discussion on that PR settled the question of whether QuerySet.contains() is a useful feature or not.

I think the original PR (to implement QuerySet.__contains__() as .filter(pk=obj.pk).exists() under the hood) is a non-starter, but adding Queryset.contains() seems reasonable to me. Gormster is right that there are a number of questions on SO with people confused about how to efficiently check for object containment in a queryset, and that M2M relationships present a not-uncommon use case for it. My inclination would be slightly in favor of adding it, if gormster updated the patch with tests and docs. But I'm interested in what others think.

comment:5 Changed 6 years ago by Tim Graham

Needs documentation: set
Needs tests: set
Resolution: wontfix
Status: closednew

Oops. Sorry, Carl. I saw your comment "The main argument against contains(), I think, would just be that it's rarely if ever needed." and not the one before it "Adding this as a .contains() method to QuerySet is a reasonable feature request that's consistent with the current design philosophy".

I am not sure if this proposal will lead to better code for anyone who doesn't read the docs or not. Since it hides the underlying fact of whether or not the QuerySet is cached, it seems to me that it won't necessarily be more intuitive to write code that's more efficient. For example:

queryset = Foo.objects.all()
for obj in range objs:
    if queryset.contains(obj):
       print("Found %s" % obj)

In the new example, my understanding is we'll get len(obj) queries, where as if we use if obj in queryset, we'll get 1 query. Please correct me if I'm wrong.

comment:6 Changed 6 years ago by Carl Meyer

You're right, but note that this is exactly parallel to the way QuerySet.count() and QuerySet.exists() currently work - they use the cache if it is populated, otherwise they execute a query. Of course, there's no reason to ever call QuerySet.count() or QuerySet.exists() multiple times in a loop, so maybe it will be more of a trap in this case. But it's already true that if you do .count() on a queryset that you are later populating fully, you've done an unnecessary extra query.

For your demonstration case, the cutoff between when it's more efficient to do more queries vs more efficient to pull in the entire QuerySet is not a priori clear; it would depend very much on the size of the whole queryset and the number of objs you are checking for containment. Probably for many such check-containment-of-multiple-objects cases the most efficient approach will be to do a values_list query and check obj IDs for membership in that set of IDs.

It's already true and will always be true, I think, that in order to maximize query efficiency using the Django ORM, you have to understand how it works. That includes things like knowing when to use select_related and prefetch_related, and it also includes understanding the result cache and what triggers populating it. In the end I don't think adding .contains(obj) really changes much, it's just a nicer spelling for .filter(pk=obj.pk).exists(). The question is really whether good use cases for the latter are common enough to warrant adding a method. And I'm really fine with either answer to that question, just didn't want the feature request to be dismissed without good rationale.

comment:7 Changed 6 years ago by Tim Graham

Resolution: needsinfo
Status: newclosed

I guess a discussion on the DevelopersMailingList needs to happen in order to move this forward. Closing until that discussion happens and there's agreement to add this.

comment:8 Changed 2 months ago by Johan Schiff

Cc: Johan Schiff added
Description: modified (diff)
Needs tests: unset
Resolution: needsinfo
Status: closednew

Reopening, based on discussion in django-developers.

comment:9 Changed 2 months ago by felixxm

Triage Stage: UnreviewedAccepted

comment:10 Changed 8 weeks ago by Johan Schiff

Description: modified (diff)
Needs documentation: unset

comment:11 Changed 6 weeks ago by felixxm

Needs documentation: set
Owner: set to Johan Schiff
Patch needs improvement: set
Status: newassigned
Note: See TracTickets for help on using tickets.
Back to Top