Opened 10 years ago
Last modified 3 days ago
#26379 assigned Bug
Document that first filter() chained to a RelatedManager is sticky
| Reported by: | Balázs Oroszi | Owned by: | Annabelle Wiegart |
|---|---|---|---|
| Component: | Documentation | Version: | 1.9 |
| Severity: | Normal | Keywords: | |
| Cc: | yaiche.amin@… | Triage Stage: | Accepted |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Given the model in a "testapp" app:
############################
from django.db import models
class Blog(models.Model):
title = models.CharField(max_length=100)
# subscribers - related from Person
# subscriptions - related from Subscription
def __str__(self):
return self.title
class Person(models.Model):
name = models.CharField(max_length=100)
subscribed_blogs = models.ManyToManyField(Blog, related_name="subscribers", through="Subscription")
# subscriptions - related from Subscription
def __str__(self):
return self.name
class Subscription(models.Model):
person = models.ForeignKey(Person, related_name="subscriptions")
blog = models.ForeignKey(Blog, related_name="subscriptions")
subscribed_date = models.DateField()
def __str__(self):
return ''.join([self.person.name, " - ", self.blog.title])
############################
When I filter "subscribers" of a Blog instance, the results are not consistent.
Here is the code demonstrating the effect:
############################
from testapp.models import *
from datetime import datetime
adam = Person.objects.create(name="Adam")
blog_1 = Blog.objects.create(title="Blog 1")
blog_2 = Blog.objects.create(title="Blog 2")
Subscription.objects.create(person=adam, blog=blog_1, subscribed_date=datetime(2016,1,10))
Subscription.objects.create(person=adam, blog=blog_2, subscribed_date=datetime(2016,1,20))
queryparams = {"subscriptions__subscribed_date__gt": datetime(2016,1,15)}
q1 = blog_1.subscribers.filter(**queryparams)
q2 = blog_1.subscribers.all().filter(**queryparams)
q3 = blog_1.subscribers.get_queryset().filter(**queryparams)
print(q1.query)
print(q1)
print(q2.query)
print(q2)
print(q3.query)
print(q3)
print("--------------------------")
q1 = blog_1.subscribers.filter().filter(**queryparams)
q2 = blog_1.subscribers.all().all().filter(**queryparams)
q3 = blog_1.subscribers.get_queryset().all().filter(**queryparams)
print(q1.query)
print(q1)
print(q2.query)
print(q2)
print(q3.query)
print(q3)
############################
The output is:
SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" INNER JOIN "testapp_subscription" ON ("testapp_person"."id" = "testapp_subscription"."person_id") WHERE ("testapp_subscription"."blog_id" = 1 AND "testapp_subscription"."subscribed_date" > 2016-01-15)
[]
SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" INNER JOIN "testapp_subscription" ON ("testapp_person"."id" = "testapp_subscription"."person_id") WHERE ("testapp_subscription"."blog_id" = 1 AND "testapp_subscription"."subscribed_date" > 2016-01-15)
[]
SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" INNER JOIN "testapp_subscription" ON ("testapp_person"."id" = "testapp_subscription"."person_id") WHERE ("testapp_subscription"."blog_id" = 1 AND "testapp_subscription"."subscribed_date" > 2016-01-15)
[]
--------------------------
SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" INNER JOIN "testapp_subscription" ON ("testapp_person"."id" = "testapp_subscription"."person_id") INNER JOIN "testapp_subscription" T4 ON ("testapp_person"."id" = T4."person_id") WHERE ("testapp_subscription"."blog_id" = 1 AND T4."subscribed_date" > 2016-01-15)
[<Person: Adam>]
SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" INNER JOIN "testapp_subscription" ON ("testapp_person"."id" = "testapp_subscription"."person_id") INNER JOIN "testapp_subscription" T4 ON ("testapp_person"."id" = T4."person_id") WHERE ("testapp_subscription"."blog_id" = 1 AND T4."subscribed_date" > 2016-01-15)
[<Person: Adam>]
SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" INNER JOIN "testapp_subscription" ON ("testapp_person"."id" = "testapp_subscription"."person_id") INNER JOIN "testapp_subscription" T4 ON ("testapp_person"."id" = T4."person_id") WHERE ("testapp_subscription"."blog_id" = 1 AND T4."subscribed_date" > 2016-01-15)
[<Person: Adam>]
The first set of queries simply "AND" the filter params with the "subscribers" RelatedManager's inherent related-filtering, while the second set of queries do a separate chain filtering.
This is exactly the kind of situation that is described in the django docs (with the blogs, "Lennon" and "2008"):
https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
I believe the second set of queries should be the correct one, and that should be happening also in the first set of queries, but that is not what is happening.
Change History (18)
comment:1 by , 10 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|
comment:2 by , 10 years ago
This is most certainly a bug, as getting completely different results in the above case for example:
blog_1.subscribers.all().filter(**queryparams)
and
blog_1.subscribers.all().all().filter(**queryparams)
(and the rest) is buggy behaviour I believe (especially since the second one gives the correct results I believe).
Unfortunately I'm not well versed enough in the internals of Django's ORM to even guess at what could be wrong, so I couldn't come up with anything better than a self-contained sample to reproduce it.
It would be nice if someone else could verify and reproduce the behaviour and having better knowledge about the ORM look into the cause.
Note, that I wrote to both django-users and django-developers first a few months ago, but got no feedback.
https://groups.google.com/forum/#!msg/django-users/tmGdMhGkCGw/NfpEAb_HEAAJ
https://groups.google.com/forum/#!msg/django-developers/gQ_qbMGQsJs/xgYL7BlkFQAJ
follow-up: 4 comment:3 by , 10 years ago
Certainly a bug.
What's weird about this is that the SQL query is correct. When i've executed the query in a DBMS it gave me "adam" in the result, while the filter() method gives an empty queryset.
comment:4 by , 10 years ago
Which query? The first one? It shouldn't give you any results, as this clause in the first set of queries:
("testapp_subscription"."blog_id" = 1 AND "testapp_subscription"."subscribed_date" > 2016-01-15)
cannot be true (there is only one subscription for "Blog 1", and that is on 2016-01-10).
I also tested the query directly in SQLite, and it seems the SQLite command line program is buggy or something, as it also gave me a result, however when I put quotes around the date above : "2016-01-15", like this:
SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" INNER JOIN "testapp_subscription" ON ("testapp_person"."id" = "testapp_subscription"."person_id") WHERE ("testapp_subscription"."blog_id" = 1 AND "testapp_subscription"."subscribed_date" > "2016-01-15")
then it correctly didn't give any results. What is incorrect here is the query string itself, the second set of queries in the original bug description are the correct ones!
About the date quotes:
We just might have found another unrelated bug here, namely the quotation around the date, I assume Django is using placeholders internally not the above string literally, but when it generates the string for us to read, it fails to put quotation marks around the date, though I don't know whether that would be correct (or it's simply an SQLite command line bug).
Replying to AmineYaiche:
Certainly a bug.
What's weird about this is that the SQL query is correct. When i've executed the query in a DBMS it gave me "adam" in the result, while the filter() method gives an empty queryset.
comment:5 by , 10 years ago
Unfortunately this is by design. The first call to .filter() targets the join generated by the relation, after that new joins are generated. See the sticky filter references in the ORM.
If possible, I'd like to change this at some point. But currently we don't have any way to get the current behavior if sticky filter is removed.
comment:6 by , 10 years ago
| Cc: | added |
|---|
comment:7 by , 7 weeks ago
| Component: | Database layer (models, ORM) → Documentation |
|---|---|
| Summary: | Inconsistent behaviour of filter() on related model (RelatedManager) → Document that first `.filter()` chained to a RelatedManager is sticky |
#36617 was a dupe. Based on the answer in comment:5, on the advice of comment:1 I'm reframing this as a documentation issue.
As pointed out in #36617, when the spanning multi-valued relationships example describes "restrictive" and "permissive" queries, it implies that chaining filter calls always creates a permissive query. From the discussion it's not clear what effect chaining .filter() on a RelatedManager has. Is it a "distinct" filter with respect to the filter baked into the RelatedManager? Answer, no: the first filter is "sticky". Subsequent filter calls are distinct.
See a shell example in ticket:36617#comment:3. See mailing list archive for more on "sticky" filters.
comment:8 by , 7 weeks ago
| Summary: | Document that first `.filter()` chained to a RelatedManager is sticky → Document that first filter() chained to a RelatedManager is sticky |
|---|
comment:9 by , 2 weeks ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
comment:10 by , 5 days ago
I was able to reproduce the output of the above queries on the Django shell, including the missing quotes around the date literal in the SQL query.
follow-up: 12 comment:11 by , 5 days ago
Glad to hear it. About the quotes:
I assume Django is using placeholders internally not the above string literally, but when it generates the string for us to read, it fails to put quotation marks around the date, though I don't know whether that would be correct
It's known that the str() of a queryset does not quote parameters, see #25705. (I think that's what was referenced in comment:4?)
comment:12 by , 4 days ago
That is my my understanding, yes. Thank you for the clarification.
Replying to Jacob Walls:
Glad to hear it. About the quotes:
I assume Django is using placeholders internally not the above string literally, but when it generates the string for us to read, it fails to put quotation marks around the date, though I don't know whether that would be correct
It's known that the
str()of a queryset does not quote parameters, see #25705. (I think that's what was referenced in comment:4?)
comment:13 by , 4 days ago
Just to be sure: The filter behaviour to be documented refers to a ManyToMany relationship with an explicit through model, like Subscription in the above example. Is that correct?
I was thinking to enhance the example in https://docs.djangoproject.com/en/5.2/topics/db/queries/ like so:
class Entry(models.Model):
...
authors = models.ManyToManyField(Author, through="Contribution")
class Contribution(models.Model):
RELATION_CHOICES = {
"aut": "author",
"edt": "editor",
"ill": "illustrator",
}
person = models.ForeignKey(Author, on_delete=models.CASCADE)
entry = models.ForeignKey(Entry, on_delete=models.CASCADE)
relation = models.CharField(max_length=3, choices=RELATION_CHOICES, default="aut")
And then use a query example that filters based on the relation attribute of Contribution. Is that going in the right direction? (I'm currently figuring out the details.)
comment:14 by , 4 days ago
Just to be sure: The filter behaviour to be documented refers to a ManyToMany relationship with an explicit through model, like Subscription in the above example. Is that correct?
Good point, it does appear limited to ManyToManyField, but I don't think the explicit through model matters, see the example in #36617 with an implicit/default through model. Reusing the Entry model seems like a good idea. Somewhere toward the end of https://docs.djangoproject.com/en/5.2/topics/db/queries/#many-to-many-relationships?
follow-up: 16 comment:15 by , 3 days ago
In my understanding, the explicit through model adds another layer to the surprise effect of the permissive filter's behaviour. As opposed to #36617, where the A table is joined with B, to which it has a ManyToMany relationship, Person (above) is joined with the relation table Subscription. This allows the permissive filter to return the adam object, even if the query params don't apply to its subscription to blog_1. Is this a behaviour that can be shown with an implicit through model as in #36617? I need to investigate this further.
comment:16 by , 3 days ago
Sorry, there was an incorrect assumption in my previous comment: A table is not joined with B but B is joined with the through table B_foo.
Replying to Annabelle Wiegart:
In my understanding, the explicit through model adds another layer to the surprise effect of the permissive filter's behaviour. As opposed to #36617, where the A table is joined with B, to which it has a ManyToMany relationship, Person (above) is joined with the relation table Subscription. This allows the permissive filter to return the adam object, even if the query params don't apply to its subscription to blog_1. Is this a behaviour that can be shown with an implicit through model as in #36617? I need to investigate this further.
comment:17 by , 3 days ago
While trying to replicate the sticky vs. permissive filter behaviour with the docs example, I discovered another quite surprising twist. In this example without an explicit through model, I was not able to reproduce the permissive filter behaviour:
>>> from django.db.models import Q
>>> arg1 = Q(headline__contains="Lennon")
>>> arg2 = Q(headline__contains="Best")
>>> q12 = taylor.texts.filter(arg1&arg2)
>>> q12
<QuerySet []>
>>> q13 = taylor.texts.filter(arg1).filter(arg2)
>>> q13
<QuerySet []>
>>> print(q13.query)
SELECT "filterexample_entry"."id", "filterexample_entry"."blog_id", "filterexample_entry"."headline", "filterexample_entry"."body_text", "filterexample_entry"."pub_date", "filterexample_entry"."mod_date", "filterexample_entry"."number_of_comments", "filterexample_entry"."number_of_pingbacks", "filterexample_entry"."rating" FROM "filterexample_entry" INNER JOIN "filterexample_entry_authors" ON ("filterexample_entry"."id" = "filterexample_entry_authors"."entry_id") WHERE ("filterexample_entry_authors"."author_id" = 2 AND "filterexample_entry"."headline"::text LIKE %Lennon% AND "filterexample_entry"."headline"::text LIKE %Best%)
I would have expected q13 to produce a double JOIN and result in a permissive filter, but it didn't.
Does an explicit through model make a difference in the end?
comment:18 by , 3 days ago
The behaviour is different with Author.objects.all() - there, I am able to produce a permissive filter.
I'm not sure how to explain the behavior. If it's not a bug, please reclassify to a documentation ticket with some explanation.