Opened 10 years ago

Closed 10 years ago

Last modified 9 years ago

#1136 closed enhancement (fixed)

[patch] Removal of duplicated DB joins resulting from kwarg evaluation

Reported by: freakboy@… Owned by: adrian
Component: Database layer (models, ORM) Version:
Severity: major Keywords: query join unique alias
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

This patch has been generated against the magic-removal branch.

DB kwarg queries can traverse many-many and many-one relationships; if this is done, the parse_lookup function works out the required table, and constructs a join and where clause as required.

However, if a query contains two queries over the same relationship, the secondary table will be joined twice. This could lead to quite inefficient underlying SQL, retrieving a lot of rows that must be discarded by the where clause; the more queries on the joined table, the worse the duplication problem becomes.

For example if the model:

class Reporter(models.Model):
    first_name = models.CharField(maxlength=30)
    last_name = models.CharField(maxlength=30)

class Article(models.Model):
    reporter = models.ForeignKey(Reporter)

was posed the query:

Article.objects._get_sql_clause(reporter__first_name__exact='John',reporter__last_name__exact='Thomas')

this would have resulted in the SQL:

SELECT "foo_articles"."id" "foo_articles"."reporter_id" 
FROM "foo_articles","foo_reporters" "t1","foo_reporters" "t3" 
WHERE "foo_articles"."reporter_id" = "t1"."id" 
   AND "foo_articles"."reporter_id" = "t3"."id" 
   AND "t1"."first_name" = 'John' 
   AND "t3"."last_name" = 'Thomas'

This is a double join over foo_reporters, which could be avoided.

With this patch applied, the same query evaluates as:

SELECT "foo_articles"."id" "foo_articles"."reporter_id" 
FROM "foo_articles" 
   INNER JOIN "foo_reporters" AS "foo_articles__reporter" 
   ON "foo_articles"."reporter_id" = "foo_articles__reporter"."id" 
WHERE "foo_articles__reporter"."first_name" = 'John' 
   AND "foo_articles__reporter"."last_name" = 'Thomas' 

Points of difference in the SQL:

  1. INNER JOIN... ON syntax is used, rather than ',' ...WHERE
  2. Table alias is related to the table lookup path, not a number

Point 1 allows clear separation of join clauses from tables/where clauses (the reason for join_where in the old version). The new version also allows for other join types to be used (although they are not exploited anywhere at present).

As for point 2; Aliases are still required to resolve circular table references (e.g., "Article has a Reporter has a favourite Article" requires an article-reporter join, and a reporter-arcticle join, but the article table cannot be reused). However, the new scheme allows table aliases to be predictable, so end users can add manual where clauses that might act on them ('module_articlesreporterfavourite_article', rather than 't4'). It also makes parse_lookup() a non-stateful call (no table_count parameter to pass around).

Internally, there are several differences:

  1. SQL returning functions (i.e., *.get_sql and parse_lookup) now return (select, joins, where, params), rather than (select, join_where, where, params, table_count)
  2. joins is a dictionary, not a list. The dictionary uses the table alias as a key; the value is a tuple of (table name, join type, join condition). See docstring for parse_lookup for more details
  3. Join uniqueness is guaranteed by the dictionary. The second attempt to join a table will overwrite the first attempt. However, on each join attempt, parse_lookup will be generated the same join, so this shouldn't be an issue.
  4. SQL accumulation logic in _get_sql_clause has been reworked to clarify a very long line of string concatenations
  5. QBase has been renamed QOperator. This was done because only QOr and QAnd inherit from QBase, the internal logic of QBase is entirely devoted to aggregating two clauses into an operator, and most importantly, Q doesn't inherit from QBase (as the name would suggest).

Unit tests have been added to many_to_one; one of these tests isn't entirely black-box, as it looks inside the generated query to ensure that only one join is performed.

However, no unit tests have been added for many to many queries. This is because there apparently ARE no unit tests for many to many queries.

Attachments (1)

unique_joins.patch (15.3 KB) - added by freakboy@… 10 years ago.
Patch to remove duplicated joins on kwarg evaluation

Download all attachments as: .zip

Change History (2)

Changed 10 years ago by freakboy@…

Patch to remove duplicated joins on kwarg evaluation

comment:1 Changed 10 years ago by adrian

  • Resolution set to fixed
  • Status changed from new to closed

(In [1792]) magic-removal: Fixed #1136: Removed duplicated DB joins resulting from kwarg evaluation. Thanks, Russ Magee

Note: See TracTickets for help on using tickets.
Back to Top