Opened 7 years ago

Closed 5 years ago

#28297 closed Bug (duplicate)

Same queryset result in two different queries on ORM.

Reported by: Marcus Renno Owned by: Marcus Renno
Component: Database layer (models, ORM) Version: 1.11
Severity: Normal Keywords: join, annotation, F
Cc: akaariai@…, tom@…, jeppe@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Marcus Renno)

Sometimes when I run a set of filter/annotation the result is different for the same variables. I'm using Django 1.11.2

The models used (simplified)

class Recipe(models.Model):
  name = models.CharField(max_length=50)
  steps = models.ManyToManyField(StepRecipe)

class StepRecipe(models.Model):
  ingredients = models.ManyToManyField(RecipeIngredient)

class RecipeIngredient(models.Model):
  ingredient = models.ForeignKey(Ingredient)

class Ingredient(models.Model):
  name = models.CharField(max_length=50)

This is the queryset command:

ingredients = ['tomato']
self.queryset = Recipe.objects.all()
self.queryset = self.queryset.annotate(total=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__ingredient__name__in=ingredients)
self.queryset = self.queryset.annotate(available=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(total=F('available'))

This is the wrong query result that comes often times:

SELECT recipebook_recipe.id, recipebook_recipe.name, recipebook_recipe.dificulty, recipebook_recipe.duration, COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS total, COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS available FROM recipebook_recipe LEFT OUTER JOIN recipebook_recipe_steps ON (recipebook_recipe.id = recipebook_recipe_steps.recipe_id) LEFT OUTER JOIN recipebook_steprecipe ON (recipebook_recipe_steps.steprecipe_id = recipebook_steprecipe.id) LEFT OUTER JOIN recipebook_steprecipe_ingredients ON (recipebook_steprecipe.id = recipebook_steprecipe_ingredients.steprecipe_id) INNER JOIN recipebook_recipe_steps T6 ON (recipebook_recipe.id = T6.recipe_id) INNER JOIN recipebook_steprecipe T7 ON (T6.steprecipe_id = T7.id) INNER JOIN recipebook_steprecipe_ingredients T8 ON (T7.id = T8.steprecipe_id) INNER JOIN recipebook_recipeingredient T9 ON (T8.recipeingredient_id = T9.id) INNER JOIN recipebook_ingredient ON (T9.ingredient_id = recipebook_ingredient.id) WHERE recipebook_ingredient.name IN ("tomato") GROUP BY recipebook_recipe.id HAVING COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) = (COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id)) ORDER BY NULL

And this is the right query that shows up sometimes:

SELECT recipebook_recipe.id, recipebook_recipe.name, recipebook_recipe.dificulty, recipebook_recipe.duration, COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS total, COUNT(DISTINCT T8.recipeingredient_id) AS available FROM recipebook_recipe LEFT OUTER JOIN recipebook_recipe_steps ON (recipebook_recipe.id = recipebook_recipe_steps.recipe_id) LEFT OUTER JOIN recipebook_steprecipe ON (recipebook_recipe_steps.steprecipe_id = recipebook_steprecipe.id) LEFT OUTER JOIN recipebook_steprecipe_ingredients ON (recipebook_steprecipe.id = recipebook_steprecipe_ingredients.steprecipe_id) INNER JOIN recipebook_recipe_steps T6 ON (recipebook_recipe.id = T6.recipe_id) INNER JOIN recipebook_steprecipe T7 ON (T6.steprecipe_id = T7.id) INNER JOIN recipebook_steprecipe_ingredients T8 ON (T7.id = T8.steprecipe_id) INNER JOIN recipebook_recipeingredient T9 ON (T8.recipeingredient_id = T9.id) INNER JOIN recipebook_ingredient ON (T9.ingredient_id = recipebook_ingredient.id) WHERE recipebook_ingredient.name IN ("tomato") GROUP BY recipebook_recipe.id HAVING COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) = (COUNT(DISTINCT T8.recipeingredient_id)) ORDER BY NULL

As you can see they are different. The wrong one does not set the variable 'available' appropriately. I wonder if this is something on the method .query.join()

Change History (36)

comment:1 by Marcus Renno, 7 years ago

Description: modified (diff)

comment:2 by Simon Charette, 7 years ago

Hello Marcus,

You'll have to provide a minimal models setup to allow us to identify the issue as it's almost impossible to figure out what could be wrong it. Please do so and reopen this ticket with the details.

Also, did you reproduce against Django 1.11 and the master branch?

comment:3 by Simon Charette, 7 years ago

Resolution: needsinfo
Status: newclosed

comment:4 by Marcus Renno, 7 years ago

Description: modified (diff)
Resolution: needsinfo
Status: closednew

comment:5 by Marcus Renno, 7 years ago

Description: modified (diff)

comment:6 by Marcus Renno, 7 years ago

Description: modified (diff)

comment:7 by Todor Velichkov, 7 years ago

I think annotating + filter on M2M field is the issue here.

Please check this thread on django-users: https://groups.google.com/forum/#!topic/django-users/RPU-PnygbLg

comment:8 by Marcus Renno, 7 years ago

Type: UncategorizedBug

comment:9 by Tom Forbes, 7 years ago

This appears to be related to a dictionary iteration somewhere. If you set a fixed PYTHONHASHSEED value (e.g PYTHONHASHSEED=1) then the query is stable, otherwise it does indeed switch between two types of SQL.

In my experiments PYTHONHASHSEED=1 produces the wrong query, whereas PYTHONHASHSEED=2 produces the correct one.

comment:10 by Tom Forbes, 7 years ago

Owner: changed from nobody to Tom Forbes
Status: newassigned

comment:11 by Tom Forbes, 7 years ago

It seems to boil down to this line: https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927

The alias_map.items() isn't stable, with PYTHONHASHSEED=1 then reuse[0] is recepie_steps, which is incorrect, but with it set to 2 then reuse[0] is (correctly) T6.

If anyone knows how to fix this then please let me know, else I will try and find a suitable fix.

in reply to:  11 ; comment:12 by Marcus Renno, 7 years ago

That seems to be right, Tom. join() is relying on the order of the dictionary resulting from .items().

It's weird that setting the hash seed to 2 would solve this problem, though. I feel like the framework should not rely on the user changing the hashing seed to a static number to fix this problem, because it breaks the purpose (security) of the the hashing being random.

Replying to Tom:

It seems to boil down to this line: https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927

The alias_map.items() isn't stable, with PYTHONHASHSEED=1 then reuse[0] is recepie_steps, which is incorrect, but with it set to 2 then reuse[0] is (correctly) T6.

If anyone knows how to fix this then please let me know, else I will try and find a suitable fix.

Last edited 7 years ago by Marcus Renno (previous) (diff)

comment:13 by Claude Paroz, 7 years ago

Triage Stage: UnreviewedAccepted

Query results depending on dict ordering is a bug.

comment:14 by Claude Paroz, 7 years ago

Cc: akaariai@… added

Looks like this was introduced in [ab89414f40db1598364a7fe4cfac1766cacd2668].

in reply to:  12 comment:15 by Tom Forbes, 7 years ago

I wasn't suggesting setting the hash seed value as a fix, merely as a debugging aide for anyone else looking at the ticket.

I've made a MR with what I think is the correct fix: https://github.com/django/django/pull/8631

The issue was that there where two joins, with differing join_types, that where considered equal in the Join.__eq__ method, and so randomly one would be chosen, which was sometimes incorrect.

Replying to Marcus Renno:

That seems to be right, Tom. join() is relying on the order of the dictionary resulting from .items().

It's weird that setting the hash seed to 2 would solve this problem, though. I feel like the framework should not rely on the user changing the hashing seed to a static number to fix this problem, because it breaks the purpose (security) of the the hashing being random.

Replying to Tom:

It seems to boil down to this line: https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927

The alias_map.items() isn't stable, with PYTHONHASHSEED=1 then reuse[0] is recepie_steps, which is incorrect, but with it set to 2 then reuse[0] is (correctly) T6.

If anyone knows how to fix this then please let me know, else I will try and find a suitable fix.

comment:16 by Tom Forbes, 7 years ago

Has patch: set

comment:17 by Tim Graham, 7 years ago

Patch needs improvement: set

comment:18 by Simon Charette, 7 years ago

This looks a bit similar to #26522 which made alias_map an OrderedDict to prevent such deterministic failures.

I wonder if it wouldn't be better to figure out what's adding values to self.alias_map in a non-deterministic order instead as it seems to be the underlying issue here.

comment:19 by Tom Forbes, 7 years ago

Hmm, you're right Simon. I missed the OrderedDict change, and tested initially on 1.11 (where it's not applied). With this, on master, the results are added deterministically, but the first item is always the incorrect join.

So I don't think my ticket fixes the issue in a convincing way. If you extend the example in the ticket with another annotate and filter:

self.queryset = self.queryset.annotate(total=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__pk=1)
self.queryset = self.queryset.annotate(available=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__pk=2)
self.queryset = self.queryset.annotate(available2=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(total=F('available')).filter(total=F('available2'))

Then reuse will have *three* aliases found: [('test_app_recipe_steps', 'LEFT OUTER JOIN'), ('T6', 'INNER JOIN'), ('T10', 'INNER JOIN')]

In this case my patch will fetch T6, and we have the same issue.

comment:20 by Tom Forbes, 7 years ago

I've dug into this a fair bit and I'm sorry to say I'm stuck. I also think there is a pretty big bug lurking here that is out of my league. Or maybe it's a feature we don't support?

As far as I can tell, if you do:

Publisher.objects
   .annotate(num_books=Count('book', distinct=True))
   .filter(book__rating__gt=3.0)
   .annotate(num_rated=Count('book', distinct=True))
   .filter(num_books=F('num_rated'))

Then the resulting SQL will have the following HAVING clause:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "publisher"."id"))

Am I correct in thinking that this should be:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "T3"."id")), where T3 is the correct join?

In any case, either the generated query is incorrect or the documentation needs to be updated.

I think, at least in the case of a m2m join, one of the issues are that there seems to be no way to resolve which alias is used by two identical annotations which are affected by a previous .filter() in the query.join method.

Version 0, edited 7 years ago by Tom Forbes (next)

comment:21 by Tom Forbes, 7 years ago

Owner: Tom Forbes removed
Status: assignednew

in reply to:  20 comment:22 by Marcus Renno, 7 years ago

Tom, I agree with you that it should be the second query and I believe that the feature should be supported. It's just a bug and we will fix it ;)

I'll try to work on this issue when I get home. I think you did a fantastic job narrowing down the problem and it's close to be solved.

Replying to Tom:

I've dug into this a fair bit and I'm sorry to say I'm stuck. I also think there is a pretty big bug lurking here that is out of my league. Or maybe it's a feature we don't support?

As far as I can tell, if you annotate over foreign keys as well as m2m:

Publisher.objects
   .annotate(num_books=Count('book', distinct=True))
   .filter(book__rating__gt=3.0)
   .annotate(num_rated=Count('book', distinct=True))
   .filter(num_books=F('num_rated'))

Then the resulting SQL will have the following HAVING clause:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "publisher"."id"))

Am I correct in thinking that this should be:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "T3"."id")), where T3 is the correct join?

In any case, either the generated query is incorrect or the documentation needs to be updated.

I think, at least in the case of a m2m join, one of the issues are that there seems to be no way to resolve which alias is used by two identical annotations which are affected by a previous .filter() in the query.join method.

comment:23 by Tom Forbes, 7 years ago

Cc: tom@… added

comment:24 by Marcus Renno, 7 years ago

Owner: set to Marcus Renno
Status: newassigned

First patch I have ever done to Django here: https://github.com/django/django/pull/8640

comment:25 by Marcus Renno, 7 years ago

Patch needs improvement: unset
Triage Stage: AcceptedUnreviewed

comment:26 by Tim Graham, 7 years ago

Triage Stage: UnreviewedAccepted

"Triage Stage" isn't reset when adding a patch, see Triaging Tickets.

in reply to:  26 comment:27 by Marcus Renno, 7 years ago

Sorry for that, Tim. Thanks for the useful information!

Replying to Tim Graham:

"Triage Stage" isn't reset when adding a patch, see Triaging Tickets.

comment:28 by Jeppe Vesterbæk, 7 years ago

Cc: jeppe@… added

comment:29 by Tim Graham, 7 years ago

Patch needs improvement: set

As noted on the PR, I'm not sure that the current approach is ideal.

comment:30 by Marcus Renno, 7 years ago

I updated the code in the PR. If someone has some time to check, please go ahead and share your comments.

comment:31 by holvianssi, 7 years ago

To me it seems that pre https://code.djangoproject.com/changeset/ab89414f40db1598364a7fe4cfac1766cacd2668 we always used the *first* join, not the last join possible in join(). I believe this has been that way for a long time, so changing that should be considered carefully.

I'm not exactly sure why do we get random order for self.alias_map in join. It's an OrderedDict, so that shouldn't be the case...

in reply to:  31 comment:32 by Marcus Renno, 7 years ago

Replying to holvianssi:

To me it seems that pre https://code.djangoproject.com/changeset/ab89414f40db1598364a7fe4cfac1766cacd2668 we always used the *first* join, not the last join possible in join(). I believe this has been that way for a long time, so changing that should be considered carefully.

I'm not exactly sure why do we get random order for self.alias_map in join. It's an OrderedDict, so that shouldn't be the case...

Hm... I always had the impression it was the last join. For me, it makes sense to use the tables in sequence, because you use actions (aka operations) in sequence. If it always grab the first joined table this wouldn't be possible. From what I understood, here in this ticket, until recent changes the order was random.

comment:33 by Marcus Renno, 7 years ago

Patch needs improvement: unset

comment:34 by Marcus Renno, 7 years ago

Hi, any updates regarding this ticket?

comment:35 by Tim Graham, 7 years ago

Patch needs improvement: set

As noted on the PR, the change introduces a non-deterministic test failure.

comment:36 by Mariusz Felisiak, 5 years ago

Has patch: unset
Patch needs improvement: unset
Resolution: duplicate
Status: assignedclosed
Summary: Same queryset result in two different queries on ORMSame queryset result in two different queries on ORM.
Note: See TracTickets for help on using tickets.
Back to Top