Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#25787 closed New feature (wontfix)

Add existence checks for related database sets on managers

Reported by: Justus Adam Owned by: Anderson Resende
Component: Database layer (models, ORM) Version: 1.8
Severity: Normal Keywords: ManyToMany, contains, in
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

I have recently had to check for presence of a particular object in a set of related objects (either ManyToMany or ForeignKey) and I found it unnecessary cumbersome not to be able to use the
in operator.

I propose to add an implementation of the __contains__ method to the ManyRelatedManager object which executes something like the following query:

def __contains__(self, obj):
    return self.filter(id=obj.id).exists()

which would enable the following sample behaviour:

my_student = request.user.student
my_course = Course.objects.get(id=course_id)
is_teacher = my_student in my_course.teachers

Change History (9)

comment:1 by Anderson Resende, 8 years ago

Owner: changed from nobody to Anderson Resende
Status: newassigned

comment:2 by Tim Graham, 8 years ago

Resolution: wontfix
Status: assignedclosed
Summary: Existence checks for related database setsAdd existence checks for related database sets on managers

Doesn't my_student in my_course.teachers.all() work? I don't think a __contains__ shortcut belongs on the manager -- otherwise users might expect the rest of QuerySet comparisons like __eq__ to work as well.

comment:3 by Justus Adam, 8 years ago

If that executes lazily then I am ok with this, thank you.

However looking at the code for QuerySet it defines an __iter__ method but no __contains__ method and __iter__ executes _fetch_all, which means instead of doing an .exists() query using in on a query set fetches a large number of objects and iterates over the result objects rather than using the database aided .exists().
This is very inefficient.

Last edited 8 years ago by Justus Adam (previous) (diff)

comment:4 by Tim Graham, 8 years ago

True, hopefully the documentation added in #17156 addresses that.

comment:5 by Justus Adam, 8 years ago

Why can't the QuerySet itself make the decision whether to use __iter__ or self.filter(..).exists()? It should have the necessary knowledge. If it is itself unevaluated using .exists() is the more efficient way in any situation and if it is evaluated then the choice should depend on the size of the data.

comment:6 by Tim Graham, 8 years ago

One example I can think of is that the QuerySet doesn't know if it's used later where it would need to be evaluated anyway. It seems simpler to me if a developer can rely on a particular operation always working in a certain way.

comment:7 by Justus Adam, 8 years ago

I understand but I think this should be explained in the documentation rather than using the inefficient way straight away.

Any (normal) user, that does not dig too deep into the workings of QuerySets will, use in instead of filter().exists() because that's the python way of doing it. I don't think it is a good idea to condemn them to potentially very inefficient queries for an operation that could be much cheaper.

The actual behaviour of __contains__ could be explained in the documentation which someone very worried about efficiency can read and then use list() to force evaluation if they need a reproducible query.

filter().exists() poses very little overhead because it queries on an indexed field that I don't think we'd have to worry about the overhead.

It is not like the current implementation doesn't have its pitfalls. Using list() on an evaluated QuerySet won't execute queries unlike on en unevaluated QuerySet. What I am trying to say is that operations on QuerySet's already do not work the same way every time, that's simply their nature. As I understand the mission of django it is to abstract away complexity and deliver efficiency, which I believe this change would bring.

Theoretically even using in on an evaluated query set is potentially very inefficient if the set is large, since it is stored in a list rather than a set.

comment:8 by Tim Graham, 8 years ago

The idea has been proposed before (actually more than once). You can raise your proposal on the DevelopersMailingList if you want to get another opinion.

The documentation added in the related ticket seems okay to me, but feel free to submit a patch with any clarifications you'd like to see.

comment:9 by Justus Adam, 8 years ago

Nah, that's fine. If there has been a deliberate decision made against it I will refrain from pushing this further. After all this is an opinionated change and if the django development team feels differently about this than I do that's fine.

Note: See TracTickets for help on using tickets.
Back to Top