Opened 7 years ago

Closed 5 years ago

#29271 closed Bug (invalid)

Chaining Filters on a Reverse Foreign Key Produces Multiple Joins

Reported by: Michael MacIntosh Owned by: nobody
Component: Database layer (models, ORM) Version: 2.0
Severity: Normal Keywords: filter chain reverse foreign key
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When you perform multiple filters on a queryset on a reverse foreign key, it produces duplicate joins, which causes the results to multiply. If instead of chaining the filters, you put both of the filters in one filter, you do not get duplicate results.

You also get this behavior if you repeat the same filter.

This also applies to Q objects that are used in chain filters that reference values across reverse foreign keys as well.

Example Models:

class Alpha(models.Model):
    field = models.CharField(max_length=100)

class Beta(models.Model):
    field = models.CharField(max_length=100)
    data = models.CharField(max_length=100)
    alpha = models.ForeignKey(Alpha, on_delete=models.CASCADE)

Example Data:

Alpha.objects.all().delete()
Beta.objects.all().delete()
alpha = Alpha(field="alpha_text")
alpha.save()
beta = Beta(field="beta_text", data="data1", alpha=alpha)
beta.save()
beta = Beta(field="beta_text", data="data2", alpha=alpha)
beta.save()

Example Output:

>>> Alpha.objects.filter(beta__field="beta_text", beta__data="data1")
<QuerySet [<Alpha: Alpha object (12)>]>
>>> Alpha.objects.filter(beta__field="beta_text").filter(beta__data="data1")
<QuerySet [<Alpha: Alpha object (12)>, <Alpha: Alpha object (12)>]>

Formatted SQL of the first query (expected):

SELECT "test_app_alpha"."id",
       "test_app_alpha"."field"
FROM   "test_app_alpha"
       INNER JOIN "test_app_beta"
               ON ( "test_app_alpha"."id" = "test_app_beta"."alpha_id" )
WHERE  ( "test_app_beta"."field" = beta_text
         AND "test_app_beta"."data" = data1 )  

Formatted SQL of the second query (bug)

SELECT "test_app_alpha"."id",
       "test_app_alpha"."field"
FROM   "test_app_alpha"
       INNER JOIN "test_app_beta"
               ON ( "test_app_alpha"."id" = "test_app_beta"."alpha_id" )
       INNER JOIN "test_app_beta" T3
               ON ( "test_app_alpha"."id" = T3."alpha_id" )
WHERE  ( "test_app_beta"."field" = beta_text
         AND T3."data" = data1 )  

Change History (9)

comment:1 by Andrew Standley, 7 years ago

Triage Stage: UnreviewedAccepted

Reproduced.

Last edited 7 years ago by Andrew Standley (previous) (diff)

comment:2 by Tim Graham, 7 years ago

Is this really unexpected behavior? Duplicate of #16554 (closed as invalid)?

comment:3 by Simon Charette, 7 years ago

Resolution: invalid
Status: newclosed

This is expected behavior as per spanning multi-valued relationships as beta is a reverse ForeignKey and thus multi-valued.

I'm not sure how we could give more visibility to this part of the documentation as this keeps being reported as a bug given how unintuitive it is. Maybe this could be mentioned in the filter() reference documentation?

comment:4 by Andrew Standley, 7 years ago

I'm pretty sure it's a good rule of thumb that when the behaviour of a feature is highly unintuitive, one should question whether it was a good design choice. I'll take this to the mailing list in the hope someone can explain the justification for this intended behaviour because ,to my naive self, it seems that Q objects would be a cleaner way to allow the type of multi-valued queries referenced in spanning multi-valued relationships.

I would argue that regardless, this is an issue with the documentation as Simon suggests.

The filter documentation simply states that filters are joined via AND. Meanwhile the chaining filters documentation explicitly states that successive filters act on the result of the last filter. Neither have any link or reference to the contradictory behaviour documented in spanning multi-valued relationships

I imagine this would require a new ticket specifically about the documentation?

Last edited 7 years ago by Andrew Standley (previous) (diff)

comment:5 by Tim Graham, 7 years ago

#27936 requests documentation improvements.

comment:6 by Andrew Standley, 7 years ago

Thanks for the link Tim, but isn't the scope of #27936 quite a bit different? That ticket is regarding possible changes to the spanning multi-value relationships documentation in order to clarify the 'intended' functionality.

The issue I see is that the chaining-filters documentation appears to contradict the spanning multi-value relationships documentation.

My suggested changes would be more along the lines of
1) Adding a link to retrieving specific objects with filters to the filter docs, so the details of filter are more visible.
2) Adding a note to chaining-filters that references the behaviour is handled differently for multi-value relationships and links to spanning multi-value relationships . (Edit: I realize these on the same page, but in almost two years of working with Django I don't think I've ever read that page top to bottom, and I'm usually pretty good about reading the docs)

Also a side note for documentation: this ticket appears to be a duplicate of #18437 as well.

Last edited 7 years ago by Andrew Standley (previous) (diff)

comment:7 by Timothy Hobbs, 5 years ago

Resolution: invalid
Status: closednew
Triage Stage: AcceptedUnreviewed

Can this bug please be re-opened?

Simon Charette's comment appears to be clarifying that two filter's are intended to be different from one. I get that, that IS expected behavior. What is NOT expected behavior is the duplication of the entries in the results and that behavior goes directly against the current documentation:

'The second filter restricts the set of blogs further'

The word 'restricts' is used multiple times in that seciton. The word 'duplicates' never occurs.

That behavior is not only unexpected and undocumented, but never useful. Doing a .count() on the results gives a bad result, displaying the duplicated data is never desirable, mapping over the duplicated data can lead to corruption. This behavior is literally always wrong and very hard to catch in testing.

comment:8 by Simon Charette, 5 years ago

Timothy,

That behavior is not only unexpected and undocumented, but never useful.

I agree with you, so does all all the folks subscribed to #10060.

But in order to deprecate the current behavior and automatically perform a subquery pushdown we need an alternative to point the user at when such events are detected.

There is ungoing work to allow aggregations to be performed using as subquery which is on my radar for review (#28296) and is something we could direct the user at. Until this lands there's no alternative we can point the user at during following theoretical deprecation period where we warn about this behavior and how it would change in future versions of Django.

In this particual case I guess we could point users at something like .filter(Exists(Q(multi_valued__foo="bar"))) when .filter(multi_valued__foo="bar") is used if we added support for Exist(Q) or .filter(multi_valued__exists=Q(foo="bar")) but that's tricky because some database engine are really bad with correlated subqueries (e.g. MySQL 5.6) so while the results would be more correct it could a ticking bomb that reveals itself only in production.

In the mean time I'd restate that a mention in the documentation couldn't hurt. Would you be able to propose an admonition Timothy?

comment:9 by Mariusz Felisiak, 5 years ago

Resolution: invalid
Status: newclosed

Timothy, please feel-free to submit a patch with a clarification/admonition to docs. We'll evaluate it even without reopening this ticket, which IMO is still invalid.

Note: See TracTickets for help on using tickets.
Back to Top