Opened 10 years ago
Closed 4 years ago
#24141 closed New feature (fixed)
contains() method for QuerySets
Reported by: | gormster | Owned by: | Johan Schiff |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Johan Schiff | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Tests for object membership in a QuerySet. Trying to alleviate some of the newbies using in
to test for membership and accidentally selecting the entire table.
Attachments (1)
Change History (18)
comment:1 by , 10 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
by , 10 years ago
Attachment: | queryset-contains.diff added |
---|
comment:2 by , 10 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:3 by , 10 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Closing per discussion on the pull request.
comment:4 by , 10 years ago
Tim, I wouldn't say the discussion on that PR settled the question of whether QuerySet.contains()
is a useful feature or not.
I think the original PR (to implement QuerySet.__contains__()
as .filter(pk=obj.pk).exists()
under the hood) is a non-starter, but adding Queryset.contains()
seems reasonable to me. Gormster is right that there are a number of questions on SO with people confused about how to efficiently check for object containment in a queryset, and that M2M relationships present a not-uncommon use case for it. My inclination would be slightly in favor of adding it, if gormster updated the patch with tests and docs. But I'm interested in what others think.
comment:5 by , 10 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Resolution: | wontfix |
Status: | closed → new |
Oops. Sorry, Carl. I saw your comment "The main argument against contains()
, I think, would just be that it's rarely if ever needed." and not the one before it "Adding this as a .contains() method to QuerySet is a reasonable feature request that's consistent with the current design philosophy".
I am not sure if this proposal will lead to better code for anyone who doesn't read the docs or not. Since it hides the underlying fact of whether or not the QuerySet is cached, it seems to me that it won't necessarily be more intuitive to write code that's more efficient. For example:
queryset = Foo.objects.all() for obj in range objs: if queryset.contains(obj): print("Found %s" % obj)
In the new example, my understanding is we'll get len(obj)
queries, where as if we use if obj in queryset
, we'll get 1 query. Please correct me if I'm wrong.
comment:6 by , 10 years ago
You're right, but note that this is exactly parallel to the way QuerySet.count()
and QuerySet.exists()
currently work - they use the cache if it is populated, otherwise they execute a query. Of course, there's no reason to ever call QuerySet.count()
or QuerySet.exists()
multiple times in a loop, so maybe it will be more of a trap in this case. But it's already true that if you do .count()
on a queryset that you are later populating fully, you've done an unnecessary extra query.
For your demonstration case, the cutoff between when it's more efficient to do more queries vs more efficient to pull in the entire QuerySet is not a priori clear; it would depend very much on the size of the whole queryset and the number of objs you are checking for containment. Probably for many such check-containment-of-multiple-objects cases the most efficient approach will be to do a values_list
query and check obj IDs for membership in that set of IDs.
It's already true and will always be true, I think, that in order to maximize query efficiency using the Django ORM, you have to understand how it works. That includes things like knowing when to use select_related
and prefetch_related
, and it also includes understanding the result cache and what triggers populating it. In the end I don't think adding .contains(obj)
really changes much, it's just a nicer spelling for .filter(pk=obj.pk).exists()
. The question is really whether good use cases for the latter are common enough to warrant adding a method. And I'm really fine with either answer to that question, just didn't want the feature request to be dismissed without good rationale.
comment:7 by , 10 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
I guess a discussion on the DevelopersMailingList needs to happen in order to move this forward. Closing until that discussion happens and there's agreement to add this.
comment:8 by , 4 years ago
Cc: | added |
---|---|
Description: | modified (diff) |
Needs tests: | unset |
Resolution: | needsinfo |
Status: | closed → new |
Reopening, based on discussion in django-developers.
comment:9 by , 4 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:10 by , 4 years ago
Description: | modified (diff) |
---|---|
Needs documentation: | unset |
comment:11 by , 4 years ago
Needs documentation: | set |
---|---|
Owner: | set to |
Patch needs improvement: | set |
Status: | new → assigned |
comment:13 by , 4 years ago
Looks like the patch still needs improvements regarding invalid model type handling and the fact the _result_cache
scanning is O(n).
comment:14 by , 4 years ago
Patch needs improvement: | unset |
---|
Author appears to have resolved Simon's comments. (Tip: unchecking Needs Improvements gets your patch back in the review queue.)
comment:15 by , 4 years ago
Patch needs improvement: | set |
---|
comment:16 by , 4 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
db/models/query.py diff