Code

Opened 4 years ago

Closed 8 months ago

Last modified 8 months ago

#14056 closed Bug (fixed)

Wrong query generated when using reverse foreign key

Reported by: premalshah Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: foreign key, reverse lookup
Cc: dtyschenko@…, lrekucki, SeanHayes Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Here is the model setup

from django.db import models

class Base(models.Model):
    id = models.AutoField(primary_key=True)
    field1 = models.CharField(max_length=11)
    
class Child(models.Model):
    id = models.AutoField(primary_key=True)
    obj = models.ForeignKey(Base)
    field1 = models.CharField(max_length=11)
    field2 = models.IntegerField()

Here are some sample queries and the corresponding sql statements generated and sql statements expected

Query 1

Base.objects.filter(child__obj=1).query

Generated SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` WHERE `apptest_base`.`id` = 1;

Expected SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` 
ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE `apptest_child`.`obj_id` = 1;

Query 2
Base.objects.filter(child__obj=1, child__field1='a').query

Generated SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` T4 
ON (`apptest_base`.`id` = T4.`obj_id`) WHERE (`apptest_base`.`id` = 1  AND T4.`field1` = a );

Expected SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` 
ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE (`apptest_child`.`obj_id` = 1  AND `apptest_child`.`field1` = a );

Query 3

Base.objects.filter(child__obj=1, child__field1='a').order_by('child__field2', 'child__obj').query

Generated SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` LEFT OUTER JOIN `apptest_child` 
ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) INNER JOIN `apptest_child` T4 
ON (`apptest_base`.`id` = T4.`obj_id`) WHERE (`apptest_base`.`id` = 1  AND T4.`field1` = a ) 
ORDER BY `apptest_child`.`field2` ASC, `apptest_base`.`id` ASC;

Expected SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` LEFT OUTER JOIN `apptest_child` 
ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE (`apptest_child`.`obj_id` = 1  AND 
`apptest_child`.`field1` = a ) ORDER BY `apptest_child`.`field2` ASC, `apptest_child`.`obj_id` ASC;

Problems:

1) Query 1 is missing a join.

2) Query 2 is using apptest_base.id in the where clause instead of apptest_child.obj_id. The results from the generated sql and expected sql would be the same. However, indexes created on the apptest_child table to optimize the query cannot be used hence leading to a slow query.

3) Query 3 has the same problem as query 2 in the where clause as well as the order by clause. In addition to that, it also performs an extra join on the apptest_child table which gives different results from the generated sql and the expected sql statements.

I believe that this happens because of the reverse foreign key field obj. When child__obj is used, the query generation logic does not know that we are referencing the obj field on the Child table and not the id field on the Base table. If we could explicitly specify child__obj_id, it would help, but thats not an options with the current database model api code.

Attachments (1)

14056_join_by_reverse_relation.diff (7.1 KB) - added by accuser 3 years ago.

Download all attachments as: .zip

Change History (17)

comment:1 Changed 4 years ago by premalshah

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

This bug also applies to django version 1.0 - 1.2

comment:2 Changed 4 years ago by premalshah

  • Version changed from 0.96 to 1.0

Clarifications on the version #:
I've changed the version to 1.0 since that is what Im using right now and also because lot of people on the django IRC channel complained about the version #. But I have seen the same behavior in 0.96, 1.1 and 1.2.

Clarifications for Query 1:

Most people on the django IRC channel said "The join is not required and so django removes it." To concentrate on issues with Query 2 and 3, I would agree.

Clarifications for Query 2:

There is a multi-column index on fields obj_id, field1 which speeds up the query quite a bit. Query 2 is a query optimization issue.

Evidence for wrong query results for Query 3:

Please add this to the class called "Base"

    def __str__(self):
        return self.field1

Now execute this:

b = Base(field1='ABC')
b.save()
c1 = Child(obj=b, field1='a', field2=1)
c2 = Child(obj=b, field1='b', field2=2)
c3 = Child(obj=b, field1='c', field2=3)
c1.save()
c2.save()
c3.save()

Base.objects.filter(child__obj=1, child__field1='a').order_by('child__field2', 'child__obj')
>>> [<Base: ABC>, <Base: ABC>, <Base: ABC>]

The actual result should be
>>> [<Base: ABC>]

comment:3 Changed 4 years ago by di0

Most people on the django IRC channel said "The join is not required and so django removes it." To concentrate on issues with Query 2 and 3, I would agree.

It's black magic. Black magic is bad. You've told to use child model, but ORM does not.

comment:4 Changed 4 years ago by di0

  • Cc dtyschenko@… added

comment:5 Changed 4 years ago by lrekucki

  • Cc lrekucki added

comment:6 Changed 3 years ago by anonymous

I agree the current behavior is incorrect because:

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` WHERE `apptest_base`.`id` = 1;

will return ALL Base objects with an id of 1, whereas:

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` 
ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE `apptest_child`.`obj_id` = 1;

will only return Base objects with an id of 1 who are referenced to by a Child model, which is what 'Base.objects.filter(childobj=1)' implies. So a join is in fact required.

comment:7 Changed 3 years ago by SeanHayes

My above comment was in reference to Query 1.

comment:8 Changed 3 years ago by SeanHayes

  • Cc SeanHayes added

comment:9 Changed 3 years ago by russellm

  • Triage Stage changed from Unreviewed to Accepted

Regarding query 1 -- the join *is* required, but it's being incorrectly optimized. This optimization is only valid when it is known that child IDs will shadow parent IDs. This will always happen for model inheritance, but that isn't what is happening in the sample models.

The remaining queries all seem to be consequences of the same thing. Fix Q1, and the rest should follow.

Although it doesn't look like it on first inspection, I suspect this may be closely related to #11319.

comment:10 Changed 3 years ago by accuser

  • Has patch set
  • Keywords key, reverse lookup added; key removed
  • Version changed from 1.0 to SVN

Got an idea, that trim optimization of the join, which has been got by the simple reverse relation, -- is incorrect, because first "real table" doesn't contain any relation to second one, so it can't get sought-for field value.

Will attach possible straight-forward solution with tests in the next comment.

I'm not sure of this ticket is related to #11319. It seems that to_field attribute is partially broken.

Changed 3 years ago by accuser

comment:11 Changed 3 years ago by julien

  • Severity set to Normal
  • Type set to Bug

comment:12 Changed 3 years ago by Demetrius Cassidy <dcassidy36@…>

  • Easy pickings unset
  • milestone changed from 1.3 to 1.4
  • UI/UX unset

comment:13 Changed 3 years ago by jacob

  • milestone 1.4 deleted

Milestone 1.4 deleted

comment:14 Changed 8 months ago by akaariai

The generated queries on master are:

SELECT "queries_base"."id", "queries_base"."field1"
  FROM "queries_base"
 INNER JOIN "queries_basechild" ON ( "queries_base"."id" = "queries_basechild"."obj_id" )
 WHERE "queries_basechild"."obj_id" = 1 

SELECT "queries_base"."id", "queries_base"."field1"
  FROM "queries_base"
 INNER JOIN "queries_basechild" ON ( "queries_base"."id" = "queries_basechild"."obj_id" )
 WHERE ("queries_basechild"."field1" = a  AND "queries_basechild"."obj_id" = 1 )

SELECT "queries_base"."id", "queries_base"."field1"
  FROM "queries_base"
 INNER JOIN "queries_basechild" ON ( "queries_base"."id" = "queries_basechild"."obj_id" )
 WHERE ("queries_basechild"."field1" = a  AND "queries_basechild"."obj_id" = 1 )
 ORDER BY "queries_basechild"."field2" ASC, "queries_base"."id" ASC

These look correct to me. The last one has INNER JOIN instead of the description's expected LEFT OUTER JOIN - but INNER JOIN is correct here as the query can't produce results if no rows match the join condition.

Also, the last one has ORDER BY ..., queries_base.id instead of "queries_base"."obj_id". But this is correct -the join is INNER JOIN, so obj_id and id have always equivalent values.

There is one bug though, doing Base.objects.order_by('basechild__field2', 'basechild__obj').query will generate a LEFT JOIN, but use base's id. This doesn't necessarily have the same value as obj_id as the join is LEFT JOIN. A fix coming soon.

comment:15 Changed 8 months ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 905409855c6b69f613190fcc2d8bd8bf5e1580b5:

Fixed #14056 -- Made sure LEFT JOIN aren't trimmed in ORDER BY

If LEFT JOINs are required for correct results, then trimming the join
can lead to incorrect results. Consider case:

TBL A: ID | TBL B: ID A_ID

1 1 1
2

Now A.order_by('ba') did use a join to B, and B's a_id column. This
was seen to contain the same value as A's id, and so the join was
trimmed. But this wasn't correct as the join is LEFT JOIN, and for row
A.id = 2 the B.a_id column is NULL.

comment:16 Changed 8 months ago by Andrew Godwin <andrew@…>

In cea720450485e0871daa7f9477fdf2bff5a5b821:

Fixed #14056 -- Made sure LEFT JOIN aren't trimmed in ORDER BY

If LEFT JOINs are required for correct results, then trimming the join
can lead to incorrect results. Consider case:

TBL A: ID | TBL B: ID A_ID

1 1 1
2

Now A.order_by('ba') did use a join to B, and B's a_id column. This
was seen to contain the same value as A's id, and so the join was
trimmed. But this wasn't correct as the join is LEFT JOIN, and for row
A.id = 2 the B.a_id column is NULL.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.