#34728 closed Bug (duplicate)
OR operator on queryset does not work as expected
Reported by: | Kbleser | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Normal | Keywords: | queryset OR |
Cc: | rene@…, Kbleser, Simon Charette | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I try to combine two querysets using the |
operator. Even though one of the querysets isn't empty, the resulting queryset is empty.
Example:
similar_recipes = at_least_four_shared_ingredients | same_tag_and_at_least_two_shared_ingredients
same_tag_and_at_least_two_shared_ingredients
contains entries but similar_recipes
doesn't.
You can reproduce the problem here: https://github.com/Inglorious-Inge/Kochbuch2/tree/queryset_problem
The test in recipes/tests/test_models.py
is failing.
The code under test is in recipes/models.py
line 30-45.
Change History (15)
comment:1 by , 17 months ago
comment:2 by , 17 months ago
Cc: | added |
---|
comment:3 by , 17 months ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Following the latest comment and after confirming the test from the repo is no longer failing, I'll mark this bug as invalid since it was really a support request.
follow-ups: 5 7 comment:4 by , 17 months ago
Resolution: | invalid |
---|---|
Status: | closed → new |
Natalia, I disagree. It is not a support request (the reporters problem was solved by using a different API), but a possible bug in the ORM. Combining two querysets with |
where at least one of them is non-empty should not result in an empty queryset, IMO. If it can (when?), we should probably document this.
It also seems to me that the behaviour does not match the note in the docs about equivalency with Q
objects?
If we can't agree here on whether this is a bug or not, should we start an ML discussion?
BTW, on Discord, there has been another report about qs1 | qs2
and qs2 | qs1
not being equivalent, which they should be according to the docs. This is probably outside of the scope of this ticket, but I thought it's worth mentioning here (maybe we can fix both issues in one go).
follow-up: 6 comment:5 by , 17 months ago
Replying to René Fleschenberg:
BTW, on Discord, there has been another report about
qs1 | qs2
andqs2 | qs1
not being equivalent, which they should be according to the docs. This is probably outside of the scope of this ticket, but I thought it's worth mentioning here (maybe we can fix both issues in one go).
Queries may not be the same according to docs, check out #33319:
"| is not a commutative operation, as different (though equivalent) queries may be generated."
comment:6 by , 17 months ago
Replying to Mariusz Felisiak:
Replying to René Fleschenberg:
BTW, on Discord, there has been another report about
qs1 | qs2
andqs2 | qs1
not being equivalent, which they should be according to the docs. This is probably outside of the scope of this ticket, but I thought it's worth mentioning here (maybe we can fix both issues in one go).
Queries may not be the same according to docs, check out #33319:
"| is not a commutative operation, as different (though equivalent) queries may be generated."
I understand "equivalent" as "they will return the same set of results". Is that not what the docs mean here?
follow-up: 8 comment:7 by , 17 months ago
Replying to René Fleschenberg:
Natalia, I disagree. It is not a support request (the reporters problem was solved by using a different API), but a possible bug in the ORM. Combining two querysets with
|
where at least one of them is non-empty should not result in an empty queryset, IMO. If it can (when?), we should probably document this.
It also seems to me that the behaviour does not match the note in the docs about equivalency with
Q
objects?
Queryset that you proposed is not the same query that user used in a sample project, I'm not sure how the ORM could figure it out from:
Recipe.objects.filter( Q(ingredients__in=self.ingredients.all() ).annotate( shared_ingredients=Count("ingredients") ).filter(shared_ingredients__gte=4) | Recipe.objects.filter( Q(tags__in=self.tags.all()) & Q(ingredients__in=self.ingredients.all() ).annotate( shared_ingredients=Count("ingredients") ).filter(shared_ingredients__gte=2)
Docs only describe how the OR operator works by rule. The ORM cannot analyze user queries and rewrite them.
follow-up: 9 comment:8 by , 17 months ago
Replying to Mariusz Felisiak:
Docs only describe how the OR operator works by rule. The ORM cannot analyze user queries and rewrite them.
I understand that this is probably not easy (maybe not possible) for the ORM to solve. Maybe I am wrong here, but I think you would expect qs1 | qs2
to return the same set of results as set(qs1) | set(qs2)
? Not sure what to do about it, though.
comment:9 by , 17 months ago
Replying to René Fleschenberg:
I understand that this is probably not easy (maybe not possible) for the ORM to solve. Maybe I am wrong here, but I think you would expect
qs1 | qs2
to return the same set of results asset(qs1) | set(qs2)
? Not sure what to do about it, though.
For this we'd have to use UNION
instead of combining filters. Queryset from the ticket description is really complicated (from the structure point of view) with overlap filters, aggregations, filters on aggregated values, etc. My point is that you cannot describe it as Model.filter(x) | Model.filter(y)
as it contains aggregations and filters on them. Docs don't precise what would happen when you use the OR
operator on Model.filter(x1).annotate(aggregation).filter(x2) | Model.filter(y1).annotate(aggregation).filter(y2)
. If you want to merge results of two independent querysets you should use .union()
.
follow-up: 11 comment:10 by , 17 months ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
I'm taking a look at this again. For the sake of clarity, René or the reporter, could you please paste here the test method (or any other needed diff) that produce a failure? I've cloned the repo and played a bit with the existing find_similar_recipes
and I'm not being able to reproduce the issue. Thanks!
Regarding:
If we can't agree here on whether this is a bug or not, should we start an ML discussion?
I think this may be the best path forward!
BTW, on Discord, there has been another report about qs1 | qs2 and qs2 | qs1 not being equivalent, which they should be according to the docs.
Do you have an example or link to the discussion at hand? Perhaps we could clarify in the docs what "equivalent" mean, in that, given that |
is not commutative, it may be the case that the generated SQL
return different results but still equivalent, no? :thinking:
follow-up: 13 comment:11 by , 17 months ago
Hi,
Replying to Natalia Bidart:
I'm taking a look at this again. For the sake of clarity, René or the reporter, could you please paste here the test method (or any other needed diff) that produce a failure? I've cloned the repo and played a bit with the existing
find_similar_recipes
and I'm not being able to reproduce the issue. Thanks!
I suspect you looked at the main
branch, which doesn't show the problem :) Sorry, I should have explicitly mentioned the branch here. If you checkout the branch queryset_problem
and run the tests, you should see that the test in recipes/tests/test_models.py
is failing: https://github.com/Inglorious-Inge/Kochbuch2/blob/queryset_problem/recipes/tests/test_models.py
(Not pasting the test setup here for the sake of brevity).
def test_find_similar_recipes(self): similar_recipes = self.recipe.find_similar_recipes() self.assertEquals(len(similar_recipes), 1)
The reason is that in recipes/models.py
on line 39 (https://github.com/Inglorious-Inge/Kochbuch2/blob/queryset_problem/recipes/models.py#L39C4-L39C106), at_least_four_shared_ingredients | same_tag_and_at_least_two_shared_ingredients
results in a queryset that turns out empty when evaluated. If you use a debugger to take a look at the two querysets being ORed together, you should see that same_tag_and_at_least_two_shared_ingredients
by itself evaluates to a non-empty queryset.
BTW, on Discord, there has been another report about qs1 | qs2 and qs2 | qs1 not being equivalent, which they should be according to the docs.
Do you have an example or link to the discussion at hand? Perhaps we could clarify in the docs what "equivalent" mean, in that, given that
|
is not commutative, it may be the case that the generatedSQL
return different results but still equivalent, no? :thinking:
I really don't know what "equivalent" would mean in regards to SQL queries if they do not return the same set of rows :) I suspect we thought so far that equivalency would be guaranteed but it is actually not (if the report on Discord is correct). I'll try to find the discussion on Discord and open a separate ticket for this, since I think it is a separate different issue.
Thanks!
comment:12 by , 17 months ago
Just for the record, the Discord discussion is here: https://discord.com/channels/856567261900832808/857642132423704577/1126286062642266223
I'll try to contact isik-kaplan and ask them if they already filed a ticket, or if they can provide a reproducible example.
comment:13 by , 17 months ago
Cc: | added |
---|
Replying to René Fleschenberg:
I suspect you looked at the
main
branch, which doesn't show the problem :) Sorry, I should have explicitly mentioned the branch here. If you checkout the branchqueryset_problem
and run the tests, you should see that the test inrecipes/tests/test_models.py
is failing: https://github.com/Inglorious-Inge/Kochbuch2/blob/queryset_problem/recipes/tests/test_models.py
Excellent! I was able to reproduce the failure as reported. What I also noticed is that without the annotation, the OR behaves as expected (though I understand the goal of the annotation in this case).
I've been trying to debug further by re-writting the queries in find_similar_recipes
, and then checking results and the generated SQL. I can't detect anything obviously wrong so my next step would be to craft the SQL I would use if I'd be building this by hand, but I ran out of time today.
I'll CC Simon in this ticket, I believe they may have a clearer view of what's expected/supported and what's not. Personally, I do find the outcome a bit unexpected, so I share the feeling, but the experts will know more.
comment:14 by , 17 months ago
TL;DR not an issue with queryset combination but another ticket that relates to how the ORM deals with aggregation when more than one multi-valued relationship is involved. Either invalid or duplicate to me.
---
As you can see on the main branch of the repository linked above by the reporter, the query works as expected if it is rewritten to use Q objects
This happens to work because you perform the aggregation before filtering the multi-valued relationship which prevents JOIN reuse as documented and results in you joining twice against against the same multi-valued relationship as well as another one (tags)
SELECT "recipes_recipe"."id", "recipes_recipe"."title", "recipes_recipe"."date_posted", "recipes_recipe"."created_by_id", "recipes_recipe"."image", "recipes_recipe"."preparation_time_in_minutes", "recipes_recipe"."instructions", "recipes_recipe"."level", "recipes_recipe"."serving_size", COUNT("recipes_ingredienttorecipe"."ingredient_id_id") AS "shared_ingredients" FROM "recipes_recipe" LEFT OUTER JOIN "recipes_ingredienttorecipe" ON ("recipes_recipe"."id" = "recipes_ingredienttorecipe"."recipe_id_id") INNER JOIN "recipes_ingredienttorecipe" T4 ON ("recipes_recipe"."id" = T4."recipe_id_id") LEFT OUTER JOIN "recipes_tagtorecipe" ON ("recipes_recipe"."id" = "recipes_tagtorecipe"."recipe_id_id") WHERE (T4."ingredient_id_id" IN (SELECT U0."id" FROM "recipes_ingredient" U0 INNER JOIN "recipes_ingredienttorecipe" U1 ON (U0."id" = U1."ingredient_id_id") WHERE U1."recipe_id_id" = 1) AND NOT ("recipes_recipe"."id" = 1)) GROUP BY "recipes_recipe"."id", "recipes_recipe"."title", "recipes_recipe"."date_posted", "recipes_recipe"."created_by_id", "recipes_recipe"."image", "recipes_recipe"."preparation_time_in_minutes", "recipes_recipe"."instructions", "recipes_recipe"."level", "recipes_recipe"."serving_size", "recipes_tagtorecipe"."tag_id_id" HAVING (("recipes_tagtorecipe"."tag_id_id" IN (SELECT U0."id" FROM "recipes_tag" U0 INNER JOIN "recipes_tagtorecipe" U1 ON (U0."id" = U1."tag_id_id") WHERE U1."recipe_id_id" = 1) AND COUNT("recipes_ingredienttorecipe"."ingredient_id_id") >= 2) OR COUNT("recipes_ingredienttorecipe"."ingredient_id_id") >= 4);
Notice how recipes_ingredienttorecipe
is joined twice which means COUNT("recipes_ingredienttorecipe"."ingredient_id_id") AS "shared_ingredients"
will actually be the COUNT
of the product of rows 2X2=4
(see #10060). In other words, aggregate annotations when many multi-valued relationships are involved are broken and have been for while and in your case it's the COUNT("recipes_ingredienttorecipe"."ingredient_id_id") >= 4)
that is allowing your test to pass on main
. The fact this problems manifests itself in queryset combination when filtering against broken aggregate annotation is just a symptom of a much larger problem.
If you define one of your aggregate annotations as a subquery to avoid joining more than one multi-valued relationship at once (#28296) things properly work
def find_similar_recipes(self): return ( Recipe.objects.annotate( shared_ingredients_cnt=Count( "ingredients", filter=Q(ingredients__in=self.ingredients.all()), ), # Using a subquery to target the _other_ multi-valued relationship avoids X product of rows # with ingredients shares_tag=Exists( TagToRecipe.objects.filter( tag_id__in=self.tags.all(), recipe_id=OuterRef("pk"), ) ), ) .filter( Q(shared_ingredients_cnt__gte=4) | Q(shares_tag=True, shared_ingredients_cnt__gte=2) ) .exclude(id=self.id) )
Which can also use queryset combination without issues
def find_similar_recipes(self): base = Recipe.objects.annotate( shared_ingredients_cnt=Count( "ingredients", filter=Q(ingredients__in=self.ingredients.all()), ) ).exclude(id=self.id) same_tag_and_at_least_two_shared_ingredients = base.annotate( shares_tag=Exists( TagToRecipe.objects.filter( tag_id__in=self.tags.all(), recipe_id=OuterRef("pk"), ) ) ).filter(Q(shares_tag=True, shared_ingredients_cnt__gte=2)) at_least_four_shared_ingredients = base.filter(shared_ingredients_cnt__gte=4) return ( same_tag_and_at_least_two_shared_ingredients | at_least_four_shared_ingredients )
comment:15 by , 17 months ago
Resolution: | needsinfo → duplicate |
---|
Thank you Simon for the detailed and clear explanation. I think this is more accurately closed as duplicate of #10060.
As you can see on the main branch of the repository linked above by the reporter, the query works as expected if it is rewritten to use
Q
objects:Thanks to CodenameTim on the Django Discord for helping to debug this!