#1136 closed enhancement (fixed)
[patch] Removal of duplicated DB joins resulting from kwarg evaluation
Reported by: | Owned by: | Adrian Holovaty | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | |
Severity: | major | Keywords: | query join unique alias |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This patch has been generated against the magic-removal branch.
DB kwarg queries can traverse many-many and many-one relationships; if this is done, the parse_lookup function works out the required table, and constructs a join and where clause as required.
However, if a query contains two queries over the same relationship, the secondary table will be joined twice. This could lead to quite inefficient underlying SQL, retrieving a lot of rows that must be discarded by the where clause; the more queries on the joined table, the worse the duplication problem becomes.
For example if the model:
class Reporter(models.Model): first_name = models.CharField(maxlength=30) last_name = models.CharField(maxlength=30) class Article(models.Model): reporter = models.ForeignKey(Reporter)
was posed the query:
Article.objects._get_sql_clause(reporter__first_name__exact='John',reporter__last_name__exact='Thomas')
this would have resulted in the SQL:
SELECT "foo_articles"."id" "foo_articles"."reporter_id" FROM "foo_articles","foo_reporters" "t1","foo_reporters" "t3" WHERE "foo_articles"."reporter_id" = "t1"."id" AND "foo_articles"."reporter_id" = "t3"."id" AND "t1"."first_name" = 'John' AND "t3"."last_name" = 'Thomas'
This is a double join over foo_reporters, which could be avoided.
With this patch applied, the same query evaluates as:
SELECT "foo_articles"."id" "foo_articles"."reporter_id" FROM "foo_articles" INNER JOIN "foo_reporters" AS "foo_articles__reporter" ON "foo_articles"."reporter_id" = "foo_articles__reporter"."id" WHERE "foo_articles__reporter"."first_name" = 'John' AND "foo_articles__reporter"."last_name" = 'Thomas'
Points of difference in the SQL:
- INNER JOIN... ON syntax is used, rather than ',' ...WHERE
- Table alias is related to the table lookup path, not a number
Point 1 allows clear separation of join clauses from tables/where clauses (the reason for join_where in the old version). The new version also allows for other join types to be used (although they are not exploited anywhere at present).
As for point 2; Aliases are still required to resolve circular table references (e.g., "Article has a Reporter has a favourite Article" requires an article-reporter join, and a reporter-arcticle join, but the article table cannot be reused). However, the new scheme allows table aliases to be predictable, so end users can add manual where clauses that might act on them ('module_articlesreporterfavourite_article', rather than 't4'). It also makes parse_lookup() a non-stateful call (no table_count parameter to pass around).
Internally, there are several differences:
- SQL returning functions (i.e., *.get_sql and parse_lookup) now return (select, joins, where, params), rather than (select, join_where, where, params, table_count)
- joins is a dictionary, not a list. The dictionary uses the table alias as a key; the value is a tuple of (table name, join type, join condition). See docstring for parse_lookup for more details
- Join uniqueness is guaranteed by the dictionary. The second attempt to join a table will overwrite the first attempt. However, on each join attempt, parse_lookup will be generated the same join, so this shouldn't be an issue.
- SQL accumulation logic in _get_sql_clause has been reworked to clarify a very long line of string concatenations
- QBase has been renamed QOperator. This was done because only QOr and QAnd inherit from QBase, the internal logic of QBase is entirely devoted to aggregating two clauses into an operator, and most importantly, Q doesn't inherit from QBase (as the name would suggest).
Unit tests have been added to many_to_one; one of these tests isn't entirely black-box, as it looks inside the generated query to ensure that only one join is performed.
However, no unit tests have been added for many to many queries. This is because there apparently ARE no unit tests for many to many queries.
Attachments (1)
Change History (2)
by , 19 years ago
Attachment: | unique_joins.patch added |
---|
comment:1 by , 19 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Patch to remove duplicated joins on kwarg evaluation