Opened 5 months ago

Last modified 4 months ago

#28297 assigned Bug

Same queryset result in two different queries on ORM

Reported by: Marcus Renno Owned by: Marcus Renno
Component: Database layer (models, ORM) Version: 1.11
Severity: Normal Keywords: join, annotation, F
Cc: akaariai@…, tom@…, jeppe@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Marcus Renno)

Sometimes when I run a set of filter/annotation the result is different for the same variables. I'm using Django 1.11.2

The models used (simplified)

class Recipe(models.Model):
  name = models.CharField(max_length=50)
  steps = models.ManyToManyField(StepRecipe)

class StepRecipe(models.Model):
  ingredients = models.ManyToManyField(RecipeIngredient)

class RecipeIngredient(models.Model):
  ingredient = models.ForeignKey(Ingredient)

class Ingredient(models.Model):
  name = models.CharField(max_length=50)

This is the queryset command:

ingredients = ['tomato']
self.queryset = Recipe.objects.all()
self.queryset = self.queryset.annotate(total=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__ingredient__name__in=ingredients)
self.queryset = self.queryset.annotate(available=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(total=F('available'))

This is the wrong query result that comes often times:

SELECT recipebook_recipe.id, recipebook_recipe.name, recipebook_recipe.dificulty, recipebook_recipe.duration, COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS total, COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS available FROM recipebook_recipe LEFT OUTER JOIN recipebook_recipe_steps ON (recipebook_recipe.id = recipebook_recipe_steps.recipe_id) LEFT OUTER JOIN recipebook_steprecipe ON (recipebook_recipe_steps.steprecipe_id = recipebook_steprecipe.id) LEFT OUTER JOIN recipebook_steprecipe_ingredients ON (recipebook_steprecipe.id = recipebook_steprecipe_ingredients.steprecipe_id) INNER JOIN recipebook_recipe_steps T6 ON (recipebook_recipe.id = T6.recipe_id) INNER JOIN recipebook_steprecipe T7 ON (T6.steprecipe_id = T7.id) INNER JOIN recipebook_steprecipe_ingredients T8 ON (T7.id = T8.steprecipe_id) INNER JOIN recipebook_recipeingredient T9 ON (T8.recipeingredient_id = T9.id) INNER JOIN recipebook_ingredient ON (T9.ingredient_id = recipebook_ingredient.id) WHERE recipebook_ingredient.name IN ("tomato") GROUP BY recipebook_recipe.id HAVING COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) = (COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id)) ORDER BY NULL

And this is the right query that shows up sometimes:

SELECT recipebook_recipe.id, recipebook_recipe.name, recipebook_recipe.dificulty, recipebook_recipe.duration, COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS total, COUNT(DISTINCT T8.recipeingredient_id) AS available FROM recipebook_recipe LEFT OUTER JOIN recipebook_recipe_steps ON (recipebook_recipe.id = recipebook_recipe_steps.recipe_id) LEFT OUTER JOIN recipebook_steprecipe ON (recipebook_recipe_steps.steprecipe_id = recipebook_steprecipe.id) LEFT OUTER JOIN recipebook_steprecipe_ingredients ON (recipebook_steprecipe.id = recipebook_steprecipe_ingredients.steprecipe_id) INNER JOIN recipebook_recipe_steps T6 ON (recipebook_recipe.id = T6.recipe_id) INNER JOIN recipebook_steprecipe T7 ON (T6.steprecipe_id = T7.id) INNER JOIN recipebook_steprecipe_ingredients T8 ON (T7.id = T8.steprecipe_id) INNER JOIN recipebook_recipeingredient T9 ON (T8.recipeingredient_id = T9.id) INNER JOIN recipebook_ingredient ON (T9.ingredient_id = recipebook_ingredient.id) WHERE recipebook_ingredient.name IN ("tomato") GROUP BY recipebook_recipe.id HAVING COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) = (COUNT(DISTINCT T8.recipeingredient_id)) ORDER BY NULL

As you can see they are different. The wrong one does not set the variable 'available' appropriately. I wonder if this is something on the method .query.join()

Change History (35)

comment:1 Changed 5 months ago by Marcus Renno

Description: modified (diff)

comment:2 Changed 5 months ago by Simon Charette

Hello Marcus,

You'll have to provide a minimal models setup to allow us to identify the issue as it's almost impossible to figure out what could be wrong it. Please do so and reopen this ticket with the details.

Also, did you reproduce against Django 1.11 and the master branch?

comment:3 Changed 5 months ago by Simon Charette

Resolution: needsinfo
Status: newclosed

comment:4 Changed 5 months ago by Marcus Renno

Description: modified (diff)
Resolution: needsinfo
Status: closednew

comment:5 Changed 5 months ago by Marcus Renno

Description: modified (diff)

comment:6 Changed 5 months ago by Marcus Renno

Description: modified (diff)

comment:7 Changed 5 months ago by Todor Velichkov

I think annotating + filter on M2M field is the issue here.

Please check this thread on django-users: https://groups.google.com/forum/#!topic/django-users/RPU-PnygbLg

comment:8 Changed 5 months ago by Marcus Renno

Type: UncategorizedBug

comment:9 Changed 5 months ago by Tom Forbes

This appears to be related to a dictionary iteration somewhere. If you set a fixed PYTHONHASHSEED value (e.g PYTHONHASHSEED=1) then the query is stable, otherwise it does indeed switch between two types of SQL.

In my experiments PYTHONHASHSEED=1 produces the wrong query, whereas PYTHONHASHSEED=2 produces the correct one.

comment:10 Changed 5 months ago by Tom Forbes

Owner: changed from nobody to Tom Forbes
Status: newassigned

comment:11 Changed 5 months ago by Tom Forbes

It seems to boil down to this line: https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927

The alias_map.items() isn't stable, with PYTHONHASHSEED=1 then reuse[0] is recepie_steps, which is incorrect, but with it set to 2 then reuse[0] is (correctly) T6.

If anyone knows how to fix this then please let me know, else I will try and find a suitable fix.

comment:12 in reply to:  11 ; Changed 5 months ago by Marcus Renno

That seems to be right, Tom. join() is relying on the order of the dictionary resulting from .items().

It's weird that setting the hash seed to 2 would solve this problem, though. I feel like the framework should not rely on the user changing the hashing seed to a static number to fix this problem, because it breaks the purpose (security) of the the hashing being random.

Replying to Tom:

It seems to boil down to this line: https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927

The alias_map.items() isn't stable, with PYTHONHASHSEED=1 then reuse[0] is recepie_steps, which is incorrect, but with it set to 2 then reuse[0] is (correctly) T6.

If anyone knows how to fix this then please let me know, else I will try and find a suitable fix.

Last edited 5 months ago by Marcus Renno (previous) (diff)

comment:13 Changed 5 months ago by Claude Paroz

Triage Stage: UnreviewedAccepted

Query results depending on dict ordering is a bug.

comment:14 Changed 5 months ago by Claude Paroz

Cc: akaariai@… added

Looks like this was introduced in [ab89414f40db1598364a7fe4cfac1766cacd2668].

comment:15 in reply to:  12 Changed 5 months ago by Tom Forbes

I wasn't suggesting setting the hash seed value as a fix, merely as a debugging aide for anyone else looking at the ticket.

I've made a MR with what I think is the correct fix: https://github.com/django/django/pull/8631

The issue was that there where two joins, with differing join_types, that where considered equal in the Join.__eq__ method, and so randomly one would be chosen, which was sometimes incorrect.

Replying to Marcus Renno:

That seems to be right, Tom. join() is relying on the order of the dictionary resulting from .items().

It's weird that setting the hash seed to 2 would solve this problem, though. I feel like the framework should not rely on the user changing the hashing seed to a static number to fix this problem, because it breaks the purpose (security) of the the hashing being random.

Replying to Tom:

It seems to boil down to this line: https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927

The alias_map.items() isn't stable, with PYTHONHASHSEED=1 then reuse[0] is recepie_steps, which is incorrect, but with it set to 2 then reuse[0] is (correctly) T6.

If anyone knows how to fix this then please let me know, else I will try and find a suitable fix.

comment:16 Changed 5 months ago by Tom Forbes

Has patch: set

comment:17 Changed 5 months ago by Tim Graham

Patch needs improvement: set

comment:18 Changed 5 months ago by Simon Charette

This looks a bit similar to #26522 which made alias_map an OrderedDict to prevent such deterministic failures.

I wonder if it wouldn't be better to figure out what's adding values to self.alias_map in a non-deterministic order instead as it seems to be the underlying issue here.

comment:19 Changed 5 months ago by Tom Forbes

Hmm, you're right Simon. I missed the OrderedDict change, and tested initially on 1.11 (where it's not applied). With this, on master, the results are added deterministically, but the first item is always the incorrect join.

So I don't think my ticket fixes the issue in a convincing way. If you extend the example in the ticket with another annotate and filter:

self.queryset = self.queryset.annotate(total=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__pk=1)
self.queryset = self.queryset.annotate(available=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__pk=2)
self.queryset = self.queryset.annotate(available2=Count('steps__ingredients', distinct=True))
self.queryset = self.queryset.filter(total=F('available')).filter(total=F('available2'))

Then reuse will have *three* aliases found: [('test_app_recipe_steps', 'LEFT OUTER JOIN'), ('T6', 'INNER JOIN'), ('T10', 'INNER JOIN')]

In this case my patch will fetch T6, and we have the same issue.

comment:20 Changed 5 months ago by Tom Forbes

I've dug into this a fair bit and I'm sorry to say I'm stuck. I also think there is a pretty big bug lurking here that is out of my league. Or maybe it's a feature we don't support?

As far as I can tell, if you annotate over foreign keys as well as m2m:

Publisher.objects
   .annotate(num_books=Count('book', distinct=True))
   .filter(book__rating__gt=3.0)
   .annotate(num_rated=Count('book', distinct=True))
   .filter(num_books=F('num_rated'))

Then the resulting SQL will have the following HAVING clause:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "publisher"."id"))

Am I correct in thinking that this should be:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "T3"."id")), where T3 is the correct join?

In any case, either the generated query is incorrect or the documentation needs to be updated.

I think, at least in the case of a m2m join, one of the issues are that there seems to be no way to resolve which alias is used by two identical annotations which are affected by a previous .filter() in the query.join method.

Last edited 5 months ago by Tom Forbes (previous) (diff)

comment:21 Changed 5 months ago by Tom Forbes

Owner: Tom Forbes deleted
Status: assignednew

comment:22 in reply to:  20 Changed 5 months ago by Marcus Renno

Tom, I agree with you that it should be the second query and I believe that the feature should be supported. It's just a bug and we will fix it ;)

I'll try to work on this issue when I get home. I think you did a fantastic job narrowing down the problem and it's close to be solved.

Replying to Tom:

I've dug into this a fair bit and I'm sorry to say I'm stuck. I also think there is a pretty big bug lurking here that is out of my league. Or maybe it's a feature we don't support?

As far as I can tell, if you annotate over foreign keys as well as m2m:

Publisher.objects
   .annotate(num_books=Count('book', distinct=True))
   .filter(book__rating__gt=3.0)
   .annotate(num_rated=Count('book', distinct=True))
   .filter(num_books=F('num_rated'))

Then the resulting SQL will have the following HAVING clause:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "publisher"."id"))

Am I correct in thinking that this should be:

HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "T3"."id")), where T3 is the correct join?

In any case, either the generated query is incorrect or the documentation needs to be updated.

I think, at least in the case of a m2m join, one of the issues are that there seems to be no way to resolve which alias is used by two identical annotations which are affected by a previous .filter() in the query.join method.

comment:23 Changed 5 months ago by Tom Forbes

Cc: tom@… added

comment:24 Changed 5 months ago by Marcus Renno

Owner: set to Marcus Renno
Status: newassigned

First patch I have ever done to Django here: https://github.com/django/django/pull/8640

comment:25 Changed 5 months ago by Marcus Renno

Patch needs improvement: unset
Triage Stage: AcceptedUnreviewed

comment:26 Changed 5 months ago by Tim Graham

Triage Stage: UnreviewedAccepted

"Triage Stage" isn't reset when adding a patch, see Triaging Tickets.

comment:27 in reply to:  26 Changed 5 months ago by Marcus Renno

Sorry for that, Tim. Thanks for the useful information!

Replying to Tim Graham:

"Triage Stage" isn't reset when adding a patch, see Triaging Tickets.

comment:28 Changed 5 months ago by Jeppe Vesterbæk

Cc: jeppe@… added

comment:29 Changed 5 months ago by Tim Graham

Patch needs improvement: set

As noted on the PR, I'm not sure that the current approach is ideal.

comment:30 Changed 5 months ago by Marcus Renno

I updated the code in the PR. If someone has some time to check, please go ahead and share your comments.

comment:31 Changed 5 months ago by holvianssi

To me it seems that pre https://code.djangoproject.com/changeset/ab89414f40db1598364a7fe4cfac1766cacd2668 we always used the *first* join, not the last join possible in join(). I believe this has been that way for a long time, so changing that should be considered carefully.

I'm not exactly sure why do we get random order for self.alias_map in join. It's an OrderedDict, so that shouldn't be the case...

comment:32 in reply to:  31 Changed 5 months ago by Marcus Renno

Replying to holvianssi:

To me it seems that pre https://code.djangoproject.com/changeset/ab89414f40db1598364a7fe4cfac1766cacd2668 we always used the *first* join, not the last join possible in join(). I believe this has been that way for a long time, so changing that should be considered carefully.

I'm not exactly sure why do we get random order for self.alias_map in join. It's an OrderedDict, so that shouldn't be the case...

Hm... I always had the impression it was the last join. For me, it makes sense to use the tables in sequence, because you use actions (aka operations) in sequence. If it always grab the first joined table this wouldn't be possible. From what I understood, here in this ticket, until recent changes the order was random.

comment:33 Changed 5 months ago by Marcus Renno

Patch needs improvement: unset

comment:34 Changed 4 months ago by Marcus Renno

Hi, any updates regarding this ticket?

comment:35 Changed 4 months ago by Tim Graham

Patch needs improvement: set

As noted on the PR, the change introduces a non-deterministic test failure.

Note: See TracTickets for help on using tickets.
Back to Top