#14056 closed Bug (fixed)
Wrong query generated when using reverse foreign key
Reported by: | premalshah | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | foreign key, reverse lookup |
Cc: | dtyschenko@…, Łukasz Rekucki, Seán Hayes | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Here is the model setup
from django.db import models class Base(models.Model): id = models.AutoField(primary_key=True) field1 = models.CharField(max_length=11) class Child(models.Model): id = models.AutoField(primary_key=True) obj = models.ForeignKey(Base) field1 = models.CharField(max_length=11) field2 = models.IntegerField()
Here are some sample queries and the corresponding sql statements generated and sql statements expected
Query 1
Base.objects.filter(child__obj=1).query
Generated SQL
SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` WHERE `apptest_base`.`id` = 1;
Expected SQL
SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE `apptest_child`.`obj_id` = 1;
Query 2
Base.objects.filter(child__obj=1, child__field1='a').query
Generated SQL
SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` T4 ON (`apptest_base`.`id` = T4.`obj_id`) WHERE (`apptest_base`.`id` = 1 AND T4.`field1` = a );
Expected SQL
SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE (`apptest_child`.`obj_id` = 1 AND `apptest_child`.`field1` = a );
Query 3
Base.objects.filter(child__obj=1, child__field1='a').order_by('child__field2', 'child__obj').query
Generated SQL
SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` LEFT OUTER JOIN `apptest_child` ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) INNER JOIN `apptest_child` T4 ON (`apptest_base`.`id` = T4.`obj_id`) WHERE (`apptest_base`.`id` = 1 AND T4.`field1` = a ) ORDER BY `apptest_child`.`field2` ASC, `apptest_base`.`id` ASC;
Expected SQL
SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` LEFT OUTER JOIN `apptest_child` ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE (`apptest_child`.`obj_id` = 1 AND `apptest_child`.`field1` = a ) ORDER BY `apptest_child`.`field2` ASC, `apptest_child`.`obj_id` ASC;
Problems:
1) Query 1 is missing a join.
2) Query 2 is using apptest_base.id in the where clause instead of apptest_child.obj_id. The results from the generated sql and expected sql would be the same. However, indexes created on the apptest_child table to optimize the query cannot be used hence leading to a slow query.
3) Query 3 has the same problem as query 2 in the where clause as well as the order by clause. In addition to that, it also performs an extra join on the apptest_child table which gives different results from the generated sql and the expected sql statements.
I believe that this happens because of the reverse foreign key field obj
. When child__obj
is used, the query generation logic does not know that we are referencing the obj
field on the Child table and not the id
field on the Base table. If we could explicitly specify child__obj_id
, it would help, but thats not an options with the current database model api code.
Attachments (1)
Change History (17)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Version: | 0.96 → 1.0 |
---|
Clarifications on the version #:
I've changed the version to 1.0 since that is what Im using right now and also because lot of people on the django IRC channel complained about the version #. But I have seen the same behavior in 0.96, 1.1 and 1.2.
Clarifications for Query 1:
Most people on the django IRC channel said "The join is not required and so django removes it." To concentrate on issues with Query 2 and 3, I would agree.
Clarifications for Query 2:
There is a multi-column index on fields obj_id, field1 which speeds up the query quite a bit. Query 2 is a query optimization issue.
Evidence for wrong query results for Query 3:
Please add this to the class called "Base"
def __str__(self): return self.field1
Now execute this:
b = Base(field1='ABC') b.save() c1 = Child(obj=b, field1='a', field2=1) c2 = Child(obj=b, field1='b', field2=2) c3 = Child(obj=b, field1='c', field2=3) c1.save() c2.save() c3.save() Base.objects.filter(child__obj=1, child__field1='a').order_by('child__field2', 'child__obj') >>> [<Base: ABC>, <Base: ABC>, <Base: ABC>] The actual result should be >>> [<Base: ABC>]
comment:3 by , 14 years ago
Most people on the django IRC channel said "The join is not required and so django removes it." To concentrate on issues with Query 2 and 3, I would agree.
It's black magic. Black magic is bad. You've told to use child model, but ORM does not.
comment:4 by , 14 years ago
Cc: | added |
---|
comment:5 by , 14 years ago
Cc: | added |
---|
comment:6 by , 14 years ago
I agree the current behavior is incorrect because:
SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` WHERE `apptest_base`.`id` = 1;
will return ALL Base objects with an id of 1, whereas:
SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE `apptest_child`.`obj_id` = 1;
will only return Base objects with an id of 1 who are referenced to by a Child model, which is what 'Base.objects.filter(childobj=1)' implies. So a join is in fact required.
comment:8 by , 14 years ago
Cc: | added |
---|
comment:9 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Regarding query 1 -- the join *is* required, but it's being incorrectly optimized. This optimization is only valid when it is known that child IDs will shadow parent IDs. This will always happen for model inheritance, but that isn't what is happening in the sample models.
The remaining queries all seem to be consequences of the same thing. Fix Q1, and the rest should follow.
Although it doesn't look like it on first inspection, I suspect this may be closely related to #11319.
comment:10 by , 14 years ago
Has patch: | set |
---|---|
Keywords: | reverse lookup added |
Version: | 1.0 → SVN |
Got an idea, that trim optimization of the join, which has been got by the simple reverse relation, -- is incorrect, because first "real table" doesn't contain any relation to second one, so it can't get sought-for field value.
Will attach possible straight-forward solution with tests in the next comment.
I'm not sure of this ticket is related to #11319. It seems that to_field
attribute is partially broken.
by , 14 years ago
Attachment: | 14056_join_by_reverse_relation.diff added |
---|
comment:11 by , 13 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:12 by , 13 years ago
Easy pickings: | unset |
---|---|
milestone: | 1.3 → 1.4 |
UI/UX: | unset |
comment:14 by , 11 years ago
The generated queries on master are:
SELECT "queries_base"."id", "queries_base"."field1" FROM "queries_base" INNER JOIN "queries_basechild" ON ( "queries_base"."id" = "queries_basechild"."obj_id" ) WHERE "queries_basechild"."obj_id" = 1 SELECT "queries_base"."id", "queries_base"."field1" FROM "queries_base" INNER JOIN "queries_basechild" ON ( "queries_base"."id" = "queries_basechild"."obj_id" ) WHERE ("queries_basechild"."field1" = a AND "queries_basechild"."obj_id" = 1 ) SELECT "queries_base"."id", "queries_base"."field1" FROM "queries_base" INNER JOIN "queries_basechild" ON ( "queries_base"."id" = "queries_basechild"."obj_id" ) WHERE ("queries_basechild"."field1" = a AND "queries_basechild"."obj_id" = 1 ) ORDER BY "queries_basechild"."field2" ASC, "queries_base"."id" ASC
These look correct to me. The last one has INNER JOIN instead of the description's expected LEFT OUTER JOIN - but INNER JOIN is correct here as the query can't produce results if no rows match the join condition.
Also, the last one has ORDER BY ..., queries_base
.id
instead of "queries_base"."obj_id". But this is correct -the join is INNER JOIN, so obj_id and id have always equivalent values.
There is one bug though, doing Base.objects.order_by('basechild__field2', 'basechild__obj').query
will generate a LEFT JOIN, but use base's id. This doesn't necessarily have the same value as obj_id as the join is LEFT JOIN. A fix coming soon.
comment:15 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
This bug also applies to django version 1.0 - 1.2