Opened 9 years ago

Closed 7 years ago

Last modified 7 years ago

#3592 closed (fixed)

Allowing Q Objects to select between OUTER and INNER joins (semi-intelligently)

Reported by: axiak@… Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Keywords: Q, QuerySet, JOINs qset_refactor, qs-rf-fixed
Cc: mir@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

I would like my Q objects to use LEFT OUTER JOIN or INNER JOIN depending on whether or not one is better than the other. e.g. if I do x & y I probably want INNER JOIN, but if I do x | y, I think the DEFAULT behavior should be an OUTER JOIN if there are any foreign key lookups.

While I do believe a complete QuerySet Refactoring is in order, I feel as though the simplicity and the great default behavior that Q objects offer is nice to have, and that this patch will improve the default behavior tremendously.

Attachments (2)

fixed_Q_joins.patch (2.9 KB) - added by anonymous 9 years ago.
ticket-3592-joins.diff (3.5 KB) - added by mir@… 8 years ago.
Improved patch, replaces the older one

Download all attachments as: .zip

Change History (23)

Changed 9 years ago by anonymous

comment:1 Changed 9 years ago by anonymous

  • Needs documentation unset
  • Needs tests set
  • Patch needs improvement unset

comment:2 Changed 9 years ago by Simon G. <dev@…>

  • Triage Stage changed from Unreviewed to Design decision needed

comment:3 Changed 9 years ago by axiak@…

  • Patch needs improvement set

Some more things:

I realized that the correct way to do ORing is by use of UNION. For example, if I have three models:
X, Y, and Z (each with parameters a,b,c) and both Y and Z foreign key to X: Z --> X and Y --> X. Then:
(They are all in app `package')

Q(yaisnull = False) | Q(zbisnull = False) should yield the following query:
(approximate)
SELECT * FROM package_x

INNER JOIN "package_y" AS "package_xy" ON

package_x.id = package_xy.x_id

WHERE package_xy.a IS NOT NULL

UNION
SELECT * FROM package_x

INNER JOIN "package_z" AS "package_xz" ON

package_x.id = package_xz.x_id

WHERE package_xz.b IS NOT NULL

Is this clear?
Unfortunately, it would take some rewriting of django to even attempt this.

comment:4 Changed 9 years ago by anonymous

(sorry I forgot to block quote...)
Some more things:

I realized that the correct way to do ORing is by use of UNION. For example, if I have three models: X, Y, and Z (each with parameters a,b,c) and both Y and Z foreign key to X: Z --> X and Y --> X. Then: (They are all in app `package')

Q(yaisnull = False) | Q(zbisnull = False)
should yield the following query:

(approximate)

SELECT * FROM package_x
    INNER JOIN "package_y" AS "package_xy" ON
        package_x.id = package_xy.x_id
            WHERE package_xy.a IS NOT NULL
UNION 
SELECT * FROM package_x
    INNER JOIN "package_z" AS "package_xz" ON
        package_x.id = package_xz.x_id
            WHERE package_xz.b IS NOT NULL

Is this clear? Unfortunately, it would take some rewriting of django to even attempt this.

comment:5 follow-up: Changed 9 years ago by mtredinnick

  • Has patch unset
  • Needs tests unset
  • Owner changed from adrian to mtredinnick
  • Patch needs improvement unset
  • Triage Stage changed from Design decision needed to Accepted

Constructing more intelligent join types is one of the goals of the QuerySet refactoring -- in fact it was one of the things that started us thinking about it. Since Q() objects are really just a way of passing information to the QuerySet construction -- and they aren't going away -- this will be fixed as part of that. I'm reluctant to be tweaking Q() objects much, since the refactoring is being worked on at the moment (although it will be post-0.96).

Removing the patch keywords, because I don't want to apply this patch (although thanks for writing it up anyway -- it looks reasonable). Let's leave Q() objects stable in the brief runup to 0.96 and make the big changes immediately afterwards.

comment:6 in reply to: ↑ 5 Changed 8 years ago by axiak@…

While I'm well aware that there is a QuerySet refactoring going on, I'm afraid that it's going to come along much too late. The fact of the matter is, the current implementation of Q OR'ing could best be described as a bug. From what I read it sounds like 0.96 is going to be the last release before a slew of backwards-incompatible changes get merged (oldforms becomes defunct, this queryset refactoring...).

This means, that if someone starts using Django, and needs to use Q objects to OR (pretty reasonable, my site is using it for our permission systems), they might come across this bug (which I think would appear pretty obscure to them if they haven't read through this code), with no where to go except maybe writing raw SQL for each of these. Might it be worth raising a NotImplementedError for ORs that have foreign-key lookups? IMHO that's better than the current result.

This is why I posted it here. I'd at least like if someone runs into this problem to know there's at least a patch they can apply that's not fully tested.

Replying to mtredinnick:

Constructing more intelligent join types is one of the goals of the QuerySet refactoring -- in fact it was one of the things that started us thinking about it. Since Q() objects are really just a way of passing information to the QuerySet construction -- and they aren't going away -- this will be fixed as part of that. I'm reluctant to be tweaking Q() objects much, since the refactoring is being worked on at the moment (although it will be post-0.96).

Removing the patch keywords, because I don't want to apply this patch (although thanks for writing it up anyway -- it looks reasonable). Let's leave Q() objects stable in the brief runup to 0.96 and make the big changes immediately afterwards.

comment:7 Changed 8 years ago by ubernostrum

#3665 was a duplicate of this.

Changed 8 years ago by mir@…

Improved patch, replaces the older one

comment:8 Changed 8 years ago by mir@…

There was a strange bug in the patch: In line 736, it called self.underneath_an_or(), which is a non-callable attribute. Replacing it with self.search_for_or() worked. Tests are passed, and all tests of my application also passed.

But the patch still helped a lot to get something done right now--thanks, axiak!

Bug #1801 (a problem with joins and AND) is unrelated, in case anybody wonders.

comment:9 Changed 8 years ago by anonymous

  • Cc mir@… added

comment:10 Changed 8 years ago by mir@…

Hmm ... I've played with the patch a bit, and I'm a bit disappointed. Outer joins seem too hard to optimize for the database. At least mysql 4.1 creates really stupid execution plans, and the queries are effectively unusable.

Perhaps this should be solved with UNION instead of OUTER JOIN.

comment:11 follow-up: Changed 8 years ago by axiak@…

I would love to use UNION. It is the right way to do it (it is logically what is meant by |). However, it seems I cannot do that without doing enough of a rewrite as to really merit a refactoring of QuerySet. This is meant to buy the current Q's some time...

What type data are you working with? I'm assuming you have indices for the columns you're joining on. If so, how many rows do you have that it's a problem?

Just curious--maybe there's a way to optimize this further?

comment:12 Changed 8 years ago by axiak@…

Please look at #3691 for some more features.

comment:13 in reply to: ↑ 11 Changed 8 years ago by Michael Radziej <mir@…>

Replying to axiak@mit.edu:

A problem with the patch is that it turns every join below the or node into an outer join. Nested joins, with an outer join at the or node and inner joins below, would make more sense. I don't know if it would help, though. The problem is with a table (basically, an association table) with a few 10k rows. From this table, I get an outer join to another table with a few 1000 rows. But it's too complex for demonstration.

There's a problem with union: To AND two unions, you'd need INTERSECT. But mysql 4.1 doesn't have INTERSECT :-(

I cannot explore this further, since I don't have more time for it. I'll resort to manual sql, once again.

As a general rule, it's hard generate useable SQL in complex cases, and it's a lot harder to support multiple databases with this. mysql isn't famous for optimizing complex queries, so solving this in the general case might really turn out to be impossible for this * "database".

Sorry that I cannot offer any more help :-(

comment:14 Changed 8 years ago by mir@…

  • Keywords qset_refactor added
  • Patch needs improvement set

Needs a better patch for reasons in my posting above. This will probably have to wait for the queryset refactoring.

comment:15 Changed 8 years ago by Michael Axiak <axiak@…>

AFAICT this patch will be defunct in the new queryset refactor.

comment:16 Changed 8 years ago by mtredinnick

  • Keywords qs-rf-fixed added

comment:17 Changed 7 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [7477]) Merged the queryset-refactor branch into trunk.

This is a big internal change, but mostly backwards compatible with existing
code. Also adds a couple of new features.

Fixed #245, #1050, #1656, #1801, #2076, #2091, #2150, #2253, #2306, #2400, #2430, #2482, #2496, #2676, #2737, #2874, #2902, #2939, #3037, #3141, #3288, #3440, #3592, #3739, #4088, #4260, #4289, #4306, #4358, #4464, #4510, #4858, #5012, #5020, #5261, #5295, #5321, #5324, #5325, #5555, #5707, #5796, #5817, #5987, #6018, #6074, #6088, #6154, #6177, #6180, #6203, #6658

comment:18 follow-up: Changed 7 years ago by sime

  • Resolution fixed deleted
  • Status changed from closed to reopened

It appears that OR-ing Q objects across related tables still produces an INNER JOIN?

comment:19 in reply to: ↑ 18 Changed 7 years ago by mir

  • Resolution set to fixed
  • Status changed from reopened to closed

Replying to sime:

It appears that OR-ing Q objects across related tables still produces an INNER JOIN?

Please give an example of a query that results in an inner join when you think it should not, and reopen the ticket again.

comment:20 Changed 7 years ago by mtredinnick

Please don't reopen this ticket. Rather, pen a new ticket containing the short example that demonstrates the problem. The issue in this ticket has been fixed and if you're seeing something going strange with Q-object behaviour, that's a separate issue. Let's keep the histories nad comments for separate issues distinct, please.

comment:21 Changed 7 years ago by mir

related: #7872

Note: See TracTickets for help on using tickets.
Back to Top