Opened 17 years ago

Closed 16 years ago

Last modified 16 years ago

#3592 closed (fixed)

Allowing Q Objects to select between OUTER and INNER joins (semi-intelligently)

Reported by: axiak@… Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Keywords: Q, QuerySet, JOINs qset_refactor, qs-rf-fixed
Cc: mir@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I would like my Q objects to use LEFT OUTER JOIN or INNER JOIN depending on whether or not one is better than the other. e.g. if I do x & y I probably want INNER JOIN, but if I do x | y, I think the DEFAULT behavior should be an OUTER JOIN if there are any foreign key lookups.

While I do believe a complete QuerySet Refactoring is in order, I feel as though the simplicity and the great default behavior that Q objects offer is nice to have, and that this patch will improve the default behavior tremendously.

Attachments (2)

fixed_Q_joins.patch (2.9 KB ) - added by anonymous 17 years ago.
ticket-3592-joins.diff (3.5 KB ) - added by mir@… 17 years ago.
Improved patch, replaces the older one

Download all attachments as: .zip

Change History (23)

by anonymous, 17 years ago

Attachment: fixed_Q_joins.patch added

comment:1 by anonymous, 17 years ago

Needs tests: set

comment:2 by Simon G. <dev@…>, 17 years ago

Triage Stage: UnreviewedDesign decision needed

comment:3 by axiak@…, 17 years ago

Patch needs improvement: set

Some more things:

I realized that the correct way to do ORing is by use of UNION. For example, if I have three models:
X, Y, and Z (each with parameters a,b,c) and both Y and Z foreign key to X: Z --> X and Y --> X. Then:
(They are all in app `package')

Q(yaisnull = False) | Q(zbisnull = False) should yield the following query:
(approximate)
SELECT * FROM package_x

INNER JOIN "package_y" AS "package_xy" ON

package_x.id = package_xy.x_id

WHERE package_xy.a IS NOT NULL

UNION
SELECT * FROM package_x

INNER JOIN "package_z" AS "package_xz" ON

package_x.id = package_xz.x_id

WHERE package_xz.b IS NOT NULL

Is this clear?
Unfortunately, it would take some rewriting of django to even attempt this.

comment:4 by anonymous, 17 years ago

(sorry I forgot to block quote...)
Some more things:

I realized that the correct way to do ORing is by use of UNION. For example, if I have three models: X, Y, and Z (each with parameters a,b,c) and both Y and Z foreign key to X: Z --> X and Y --> X. Then: (They are all in app `package')

Q(yaisnull = False) | Q(zbisnull = False)
should yield the following query:

(approximate)

SELECT * FROM package_x
    INNER JOIN "package_y" AS "package_xy" ON
        package_x.id = package_xy.x_id
            WHERE package_xy.a IS NOT NULL
UNION 
SELECT * FROM package_x
    INNER JOIN "package_z" AS "package_xz" ON
        package_x.id = package_xz.x_id
            WHERE package_xz.b IS NOT NULL

Is this clear? Unfortunately, it would take some rewriting of django to even attempt this.

comment:5 by Malcolm Tredinnick, 17 years ago

Has patch: unset
Needs tests: unset
Owner: changed from Adrian Holovaty to Malcolm Tredinnick
Patch needs improvement: unset
Triage Stage: Design decision neededAccepted

Constructing more intelligent join types is one of the goals of the QuerySet refactoring -- in fact it was one of the things that started us thinking about it. Since Q() objects are really just a way of passing information to the QuerySet construction -- and they aren't going away -- this will be fixed as part of that. I'm reluctant to be tweaking Q() objects much, since the refactoring is being worked on at the moment (although it will be post-0.96).

Removing the patch keywords, because I don't want to apply this patch (although thanks for writing it up anyway -- it looks reasonable). Let's leave Q() objects stable in the brief runup to 0.96 and make the big changes immediately afterwards.

in reply to:  5 comment:6 by axiak@…, 17 years ago

While I'm well aware that there is a QuerySet refactoring going on, I'm afraid that it's going to come along much too late. The fact of the matter is, the current implementation of Q OR'ing could best be described as a bug. From what I read it sounds like 0.96 is going to be the last release before a slew of backwards-incompatible changes get merged (oldforms becomes defunct, this queryset refactoring...).

This means, that if someone starts using Django, and needs to use Q objects to OR (pretty reasonable, my site is using it for our permission systems), they might come across this bug (which I think would appear pretty obscure to them if they haven't read through this code), with no where to go except maybe writing raw SQL for each of these. Might it be worth raising a NotImplementedError for ORs that have foreign-key lookups? IMHO that's better than the current result.

This is why I posted it here. I'd at least like if someone runs into this problem to know there's at least a patch they can apply that's not fully tested.

Replying to mtredinnick:

Constructing more intelligent join types is one of the goals of the QuerySet refactoring -- in fact it was one of the things that started us thinking about it. Since Q() objects are really just a way of passing information to the QuerySet construction -- and they aren't going away -- this will be fixed as part of that. I'm reluctant to be tweaking Q() objects much, since the refactoring is being worked on at the moment (although it will be post-0.96).

Removing the patch keywords, because I don't want to apply this patch (although thanks for writing it up anyway -- it looks reasonable). Let's leave Q() objects stable in the brief runup to 0.96 and make the big changes immediately afterwards.

comment:7 by James Bennett, 17 years ago

#3665 was a duplicate of this.

by mir@…, 17 years ago

Attachment: ticket-3592-joins.diff added

Improved patch, replaces the older one

comment:8 by mir@…, 17 years ago

There was a strange bug in the patch: In line 736, it called self.underneath_an_or(), which is a non-callable attribute. Replacing it with self.search_for_or() worked. Tests are passed, and all tests of my application also passed.

But the patch still helped a lot to get something done right now--thanks, axiak!

Bug #1801 (a problem with joins and AND) is unrelated, in case anybody wonders.

comment:9 by anonymous, 17 years ago

Cc: mir@… added

comment:10 by mir@…, 17 years ago

Hmm ... I've played with the patch a bit, and I'm a bit disappointed. Outer joins seem too hard to optimize for the database. At least mysql 4.1 creates really stupid execution plans, and the queries are effectively unusable.

Perhaps this should be solved with UNION instead of OUTER JOIN.

comment:11 by axiak@…, 17 years ago

I would love to use UNION. It is the right way to do it (it is logically what is meant by |). However, it seems I cannot do that without doing enough of a rewrite as to really merit a refactoring of QuerySet. This is meant to buy the current Q's some time...

What type data are you working with? I'm assuming you have indices for the columns you're joining on. If so, how many rows do you have that it's a problem?

Just curious--maybe there's a way to optimize this further?

comment:12 by axiak@…, 17 years ago

Please look at #3691 for some more features.

in reply to:  11 comment:13 by Michael Radziej <mir@…>, 17 years ago

Replying to axiak@mit.edu:

A problem with the patch is that it turns every join below the or node into an outer join. Nested joins, with an outer join at the or node and inner joins below, would make more sense. I don't know if it would help, though. The problem is with a table (basically, an association table) with a few 10k rows. From this table, I get an outer join to another table with a few 1000 rows. But it's too complex for demonstration.

There's a problem with union: To AND two unions, you'd need INTERSECT. But mysql 4.1 doesn't have INTERSECT :-(

I cannot explore this further, since I don't have more time for it. I'll resort to manual sql, once again.

As a general rule, it's hard generate useable SQL in complex cases, and it's a lot harder to support multiple databases with this. mysql isn't famous for optimizing complex queries, so solving this in the general case might really turn out to be impossible for this * "database".

Sorry that I cannot offer any more help :-(

comment:14 by mir@…, 17 years ago

Keywords: qset_refactor added
Patch needs improvement: set

Needs a better patch for reasons in my posting above. This will probably have to wait for the queryset refactoring.

comment:15 by Michael Axiak <axiak@…>, 17 years ago

AFAICT this patch will be defunct in the new queryset refactor.

comment:16 by Malcolm Tredinnick, 17 years ago

Keywords: qs-rf-fixed added

comment:17 by Malcolm Tredinnick, 16 years ago

Resolution: fixed
Status: newclosed

(In [7477]) Merged the queryset-refactor branch into trunk.

This is a big internal change, but mostly backwards compatible with existing
code. Also adds a couple of new features.

Fixed #245, #1050, #1656, #1801, #2076, #2091, #2150, #2253, #2306, #2400, #2430, #2482, #2496, #2676, #2737, #2874, #2902, #2939, #3037, #3141, #3288, #3440, #3592, #3739, #4088, #4260, #4289, #4306, #4358, #4464, #4510, #4858, #5012, #5020, #5261, #5295, #5321, #5324, #5325, #5555, #5707, #5796, #5817, #5987, #6018, #6074, #6088, #6154, #6177, #6180, #6203, #6658

comment:18 by sime, 16 years ago

Resolution: fixed
Status: closedreopened

It appears that OR-ing Q objects across related tables still produces an INNER JOIN?

in reply to:  18 comment:19 by Michael Radziej, 16 years ago

Resolution: fixed
Status: reopenedclosed

Replying to sime:

It appears that OR-ing Q objects across related tables still produces an INNER JOIN?

Please give an example of a query that results in an inner join when you think it should not, and reopen the ticket again.

comment:20 by Malcolm Tredinnick, 16 years ago

Please don't reopen this ticket. Rather, pen a new ticket containing the short example that demonstrates the problem. The issue in this ticket has been fixed and if you're seeing something going strange with Q-object behaviour, that's a separate issue. Let's keep the histories nad comments for separate issues distinct, please.

comment:21 by Michael Radziej, 16 years ago

related: #7872

Note: See TracTickets for help on using tickets.
Back to Top