Opened 17 months ago

Last modified 15 months ago

#28297 assigned Bug

Same queryset result in two different queries on ORM

Reported by: Marcus Renno Owned by: Marcus Renno
Component: Database layer (models, ORM) Version: 1.11
Severity: Normal Keywords: join, annotation, F
Cc: akaariai@…, tom@…, jeppe@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Marcus Renno)

Sometimes when I run a set of filter/annotation the result is different for the same variables. I'm using Django 1.11.2

The models used (simplified)

class Recipe(models.Model):
  name = models.CharField(max_length=50)
  steps = models.ManyToManyField(StepRecipe)

class StepRecipe(models.Model):
  ingredients = models.ManyToManyField(RecipeIngredient)

class RecipeIngredient(models.Model):
  ingredient = models.ForeignKey(Ingredient)

class Ingredient(models.Model):
  name = models.CharField(max_length=50)

This is the queryset command:

ingredients = ['tomato']
self.queryset = Recipe.objects.all()
self.queryset = self.queryset.annotate(total=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__ingredient__name__in=ingredients)
self.queryset = self.queryset.annotate(available=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(total=F('available'))

This is the wrong query result that comes often times:

SELECT recipebook_recipe.id, recipebook_recipe.name, recipebook_recipe.dificulty, recipebook_recipe.duration, COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS total, COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS available FROM recipebook_recipe LEFT OUTER JOIN recipebook_recipe_steps ON (recipebook_recipe.id = recipebook_recipe_steps.recipe_id) LEFT OUTER JOIN recipebook_steprecipe ON (recipebook_recipe_steps.steprecipe_id = recipebook_steprecipe.id) LEFT OUTER JOIN recipebook_steprecipe_ingredients ON (recipebook_steprecipe.id = recipebook_steprecipe_ingredients.steprecipe_id) INNER JOIN recipebook_recipe_steps T6 ON (recipebook_recipe.id = T6.recipe_id) INNER JOIN recipebook_steprecipe T7 ON (T6.steprecipe_id = T7.id) INNER JOIN recipebook_steprecipe_ingredients T8 ON (T7.id = T8.steprecipe_id) INNER JOIN recipebook_recipeingredient T9 ON (T8.recipeingredient_id = T9.id) INNER JOIN recipebook_ingredient ON (T9.ingredient_id = recipebook_ingredient.id) WHERE recipebook_ingredient.name IN ("tomato") GROUP BY recipebook_recipe.id HAVING COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) = (COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id)) ORDER BY NULL

And this is the right query that shows up sometimes:

SELECT recipebook_recipe.id, recipebook_recipe.name, recipebook_recipe.dificulty, recipebook_recipe.duration, COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS total, COUNT(DISTINCT T8.recipeingredient_id) AS available FROM recipebook_recipe LEFT OUTER JOIN recipebook_recipe_steps ON (recipebook_recipe.id = recipebook_recipe_steps.recipe_id) LEFT OUTER JOIN recipebook_steprecipe ON (recipebook_recipe_steps.steprecipe_id = recipebook_steprecipe.id) LEFT OUTER JOIN recipebook_steprecipe_ingredients ON (recipebook_steprecipe.id = recipebook_steprecipe_ingredients.steprecipe_id) INNER JOIN recipebook_recipe_steps T6 ON (recipebook_recipe.id = T6.recipe_id) INNER JOIN recipebook_steprecipe T7 ON (T6.steprecipe_id = T7.id) INNER JOIN recipebook_steprecipe_ingredients T8 ON (T7.id = T8.steprecipe_id) INNER JOIN recipebook_recipeingredient T9 ON (T8.recipeingredient_id = T9.id) INNER JOIN recipebook_ingredient ON (T9.ingredient_id = recipebook_ingredient.id) WHERE recipebook_ingredient.name IN ("tomato") GROUP BY recipebook_recipe.id HAVING COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) = (COUNT(DISTINCT T8.recipeingredient_id)) ORDER BY NULL

As you can see they are different. The wrong one does not set the variable 'available' appropriately. I wonder if this is something on the method .query.join()

Change History (35)

comment:1 Changed 17 months ago by Marcus Renno

Description: modified (diff)

comment:2 Changed 17 months ago by Simon Charette

Hello Marcus,

You'll have to provide a minimal models setup to allow us to identify the issue as it's almost impossible to figure out what could be wrong it. Please do so and reopen this ticket with the details.

Also, did you reproduce against Django 1.11 and the master branch?

comment:3 Changed 17 months ago by Simon Charette

Resolution: needsinfo
Status: newclosed

comment:4 Changed 17 months ago by Marcus Renno

Description: modified (diff)
Resolution: needsinfo
Status: closednew

comment:5 Changed 17 months ago by Marcus Renno

Description: modified (diff)

comment:6 Changed 17 months ago by Marcus Renno

Description: modified (diff)

comment:7 Changed 17 months ago by Todor Velichkov

I think annotating + filter on M2M field is the issue here.

Please check this thread on django-users: https://groups.google.com/forum/#!topic/django-users/RPU-PnygbLg

comment:8 Changed 17 months ago by Marcus Renno

Type: UncategorizedBug

comment:9 Changed 17 months ago by Tom Forbes

This appears to be related to a dictionary iteration somewhere. If you set a fixed PYTHONHASHSEED value (e.g PYTHONHASHSEED=1) then the query is stable, otherwise it does indeed switch between two types of SQL.

In my experiments PYTHONHASHSEED=1 produces the wrong query, whereas PYTHONHASHSEED=2 produces the correct one.

comment:10 Changed 17 months ago by Tom Forbes

Owner: changed from nobody to Tom Forbes
Status: newassigned

comment:11 Changed 17 months ago by Tom Forbes

It seems to boil down to this line: https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927

The alias_map.items() isn't stable, with PYTHONHASHSEED=1 then reuse[0] is recepie_steps, which is incorrect, but with it set to 2 then reuse[0] is (correctly) T6.

If anyone knows how to fix this then please let me know, else I will try and find a suitable fix.

comment:12 in reply to:  11 ; Changed 17 months ago by Marcus Renno

That seems to be right, Tom. join() is relying on the order of the dictionary resulting from .items().

It's weird that setting the hash seed to 2 would solve this problem, though. I feel like the framework should not rely on the user changing the hashing seed to a static number to fix this problem, because it breaks the purpose (security) of the the hashing being random.

Replying to Tom:

It seems to boil down to this line: https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927

The alias_map.items() isn't stable, with PYTHONHASHSEED=1 then reuse[0] is recepie_steps, which is incorrect, but with it set to 2 then reuse[0] is (correctly) T6.

If anyone knows how to fix this then please let me know, else I will try and find a suitable fix.

Last edited 17 months ago by Marcus Renno (previous) (diff)

comment:13 Changed 17 months ago by Claude Paroz

Triage Stage: UnreviewedAccepted

Query results depending on dict ordering is a bug.

comment:14 Changed 17 months ago by Claude Paroz

Cc: akaariai@… added

Looks like this was introduced in [ab89414f40db1598364a7fe4cfac1766cacd2668].

comment:15 in reply to:  12 Changed 17 months ago by Tom Forbes

I wasn't suggesting setting the hash seed value as a fix, merely as a debugging aide for anyone else looking at the ticket.

I've made a MR with what I think is the correct fix: https://github.com/django/django/pull/8631

The issue was that there where two joins, with differing join_types, that where considered equal in the Join.__eq__ method, and so randomly one would be chosen, which was sometimes incorrect.

Replying to Marcus Renno:

That seems to be right, Tom. join() is relying on the order of the dictionary resulting from .items().

It's weird that setting the hash seed to 2 would solve this problem, though. I feel like the framework should not rely on the user changing the hashing seed to a static number to fix this problem, because it breaks the purpose (security) of the the hashing being random.

Replying to Tom:

It seems to boil down to this line: https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927

The alias_map.items() isn't stable, with PYTHONHASHSEED=1 then reuse[0] is recepie_steps, which is incorrect, but with it set to 2 then reuse[0] is (correctly) T6.

If anyone knows how to fix this then please let me know, else I will try and find a suitable fix.

comment:16 Changed 17 months ago by Tom Forbes

Has patch: set

comment:17 Changed 17 months ago by Tim Graham

Patch needs improvement: set

comment:18 Changed 17 months ago by Simon Charette

This looks a bit similar to #26522 which made alias_map an OrderedDict to prevent such deterministic failures.

I wonder if it wouldn't be better to figure out what's adding values to self.alias_map in a non-deterministic order instead as it seems to be the underlying issue here.

comment:19 Changed 17 months ago by Tom Forbes

Hmm, you're right Simon. I missed the OrderedDict change, and tested initially on 1.11 (where it's not applied). With this, on master, the results are added deterministically, but the first item is always the incorrect join.

So I don't think my ticket fixes the issue in a convincing way. If you extend the example in the ticket with another annotate and filter:

self.queryset = self.queryset.annotate(total=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__pk=1)
self.queryset = self.queryset.annotate(available=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__pk=2)
self.queryset = self.queryset.annotate(available2=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(total=F('available')).filter(total=F('available2'))

Then reuse will have *three* aliases found: [('test_app_recipe_steps', 'LEFT OUTER JOIN'), ('T6', 'INNER JOIN'), ('T10', 'INNER JOIN')]

In this case my patch will fetch T6, and we have the same issue.

comment:20 Changed 17 months ago by Tom Forbes

I've dug into this a fair bit and I'm sorry to say I'm stuck. I also think there is a pretty big bug lurking here that is out of my league. Or maybe it's a feature we don't support?

As far as I can tell, if you annotate over foreign keys as well as m2m:

Publisher.objects
   .annotate(num_books=Count('book', distinct=True))
   .filter(book__rating__gt=3.0)
   .annotate(num_rated=Count('book', distinct=True))
   .filter(num_books=F('num_rated'))

Then the resulting SQL will have the following HAVING clause:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "publisher"."id"))

Am I correct in thinking that this should be:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "T3"."id")), where T3 is the correct join?

In any case, either the generated query is incorrect or the documentation needs to be updated.

I think, at least in the case of a m2m join, one of the issues are that there seems to be no way to resolve which alias is used by two identical annotations which are affected by a previous .filter() in the query.join method.

Last edited 17 months ago by Tom Forbes (previous) (diff)

comment:21 Changed 17 months ago by Tom Forbes

Owner: Tom Forbes deleted
Status: assignednew

comment:22 in reply to:  20 Changed 17 months ago by Marcus Renno

Tom, I agree with you that it should be the second query and I believe that the feature should be supported. It's just a bug and we will fix it ;)

I'll try to work on this issue when I get home. I think you did a fantastic job narrowing down the problem and it's close to be solved.

Replying to Tom:

I've dug into this a fair bit and I'm sorry to say I'm stuck. I also think there is a pretty big bug lurking here that is out of my league. Or maybe it's a feature we don't support?

As far as I can tell, if you annotate over foreign keys as well as m2m:

Publisher.objects
   .annotate(num_books=Count('book', distinct=True))
   .filter(book__rating__gt=3.0)
   .annotate(num_rated=Count('book', distinct=True))
   .filter(num_books=F('num_rated'))

Then the resulting SQL will have the following HAVING clause:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "publisher"."id"))

Am I correct in thinking that this should be:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "T3"."id")), where T3 is the correct join?

In any case, either the generated query is incorrect or the documentation needs to be updated.

I think, at least in the case of a m2m join, one of the issues are that there seems to be no way to resolve which alias is used by two identical annotations which are affected by a previous .filter() in the query.join method.

comment:23 Changed 17 months ago by Tom Forbes

Cc: tom@… added

comment:24 Changed 17 months ago by Marcus Renno

Owner: set to Marcus Renno
Status: newassigned

First patch I have ever done to Django here: https://github.com/django/django/pull/8640

comment:25 Changed 17 months ago by Marcus Renno

Patch needs improvement: unset
Triage Stage: AcceptedUnreviewed

comment:26 Changed 17 months ago by Tim Graham

Triage Stage: UnreviewedAccepted

"Triage Stage" isn't reset when adding a patch, see Triaging Tickets.

comment:27 in reply to:  26 Changed 17 months ago by Marcus Renno

Sorry for that, Tim. Thanks for the useful information!

Replying to Tim Graham:

"Triage Stage" isn't reset when adding a patch, see Triaging Tickets.

comment:28 Changed 16 months ago by Jeppe Vesterbæk

Cc: jeppe@… added

comment:29 Changed 16 months ago by Tim Graham

Patch needs improvement: set

As noted on the PR, I'm not sure that the current approach is ideal.

comment:30 Changed 16 months ago by Marcus Renno

I updated the code in the PR. If someone has some time to check, please go ahead and share your comments.

comment:31 Changed 16 months ago by holvianssi

To me it seems that pre https://code.djangoproject.com/changeset/ab89414f40db1598364a7fe4cfac1766cacd2668 we always used the *first* join, not the last join possible in join(). I believe this has been that way for a long time, so changing that should be considered carefully.

I'm not exactly sure why do we get random order for self.alias_map in join. It's an OrderedDict, so that shouldn't be the case...

comment:32 in reply to:  31 Changed 16 months ago by Marcus Renno

Replying to holvianssi:

To me it seems that pre https://code.djangoproject.com/changeset/ab89414f40db1598364a7fe4cfac1766cacd2668 we always used the *first* join, not the last join possible in join(). I believe this has been that way for a long time, so changing that should be considered carefully.

I'm not exactly sure why do we get random order for self.alias_map in join. It's an OrderedDict, so that shouldn't be the case...

Hm... I always had the impression it was the last join. For me, it makes sense to use the tables in sequence, because you use actions (aka operations) in sequence. If it always grab the first joined table this wouldn't be possible. From what I understood, here in this ticket, until recent changes the order was random.

comment:33 Changed 16 months ago by Marcus Renno

Patch needs improvement: unset

comment:34 Changed 16 months ago by Marcus Renno

Hi, any updates regarding this ticket?

comment:35 Changed 15 months ago by Tim Graham

Patch needs improvement: set

As noted on the PR, the change introduces a non-deterministic test failure.

Note: See TracTickets for help on using tickets.
Back to Top