#3592 closed (fixed)
Allowing Q Objects to select between OUTER and INNER joins (semi-intelligently)
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Keywords: | Q, QuerySet, JOINs qset_refactor, qs-rf-fixed | |
Cc: | mir@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
I would like my Q objects to use LEFT OUTER JOIN or INNER JOIN depending on whether or not one is better than the other. e.g. if I do x & y I probably want INNER JOIN, but if I do x | y, I think the DEFAULT behavior should be an OUTER JOIN if there are any foreign key lookups.
While I do believe a complete QuerySet Refactoring is in order, I feel as though the simplicity and the great default behavior that Q objects offer is nice to have, and that this patch will improve the default behavior tremendously.
Attachments (2)
Change History (23)
by , 18 years ago
Attachment: | fixed_Q_joins.patch added |
---|
comment:1 by , 18 years ago
Needs tests: | set |
---|
comment:2 by , 18 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:3 by , 18 years ago
Patch needs improvement: | set |
---|
comment:4 by , 18 years ago
(sorry I forgot to block quote...)
Some more things:
I realized that the correct way to do ORing is by use of UNION. For example, if I have three models: X, Y, and Z (each with parameters a,b,c) and both Y and Z foreign key to X: Z --> X and Y --> X. Then: (They are all in app `package')
Q(yaisnull = False) | Q(zbisnull = False)
should yield the following query:
(approximate)
SELECT * FROM package_x INNER JOIN "package_y" AS "package_xy" ON package_x.id = package_xy.x_id WHERE package_xy.a IS NOT NULL UNION SELECT * FROM package_x INNER JOIN "package_z" AS "package_xz" ON package_x.id = package_xz.x_id WHERE package_xz.b IS NOT NULL
Is this clear? Unfortunately, it would take some rewriting of django to even attempt this.
follow-up: 6 comment:5 by , 18 years ago
Has patch: | unset |
---|---|
Needs tests: | unset |
Owner: | changed from | to
Patch needs improvement: | unset |
Triage Stage: | Design decision needed → Accepted |
Constructing more intelligent join types is one of the goals of the QuerySet refactoring -- in fact it was one of the things that started us thinking about it. Since Q()
objects are really just a way of passing information to the QuerySet construction -- and they aren't going away -- this will be fixed as part of that. I'm reluctant to be tweaking Q()
objects much, since the refactoring is being worked on at the moment (although it will be post-0.96).
Removing the patch keywords, because I don't want to apply this patch (although thanks for writing it up anyway -- it looks reasonable). Let's leave Q()
objects stable in the brief runup to 0.96 and make the big changes immediately afterwards.
comment:6 by , 18 years ago
While I'm well aware that there is a QuerySet refactoring going on, I'm afraid that it's going to come along much too late. The fact of the matter is, the current implementation of Q OR'ing could best be described as a bug. From what I read it sounds like 0.96 is going to be the last release before a slew of backwards-incompatible changes get merged (oldforms becomes defunct, this queryset refactoring...).
This means, that if someone starts using Django, and needs to use Q objects to OR (pretty reasonable, my site is using it for our permission systems), they might come across this bug (which I think would appear pretty obscure to them if they haven't read through this code), with no where to go except maybe writing raw SQL for each of these. Might it be worth raising a NotImplementedError for ORs that have foreign-key lookups? IMHO that's better than the current result.
This is why I posted it here. I'd at least like if someone runs into this problem to know there's at least a patch they can apply that's not fully tested.
Replying to mtredinnick:
Constructing more intelligent join types is one of the goals of the QuerySet refactoring -- in fact it was one of the things that started us thinking about it. Since
Q()
objects are really just a way of passing information to the QuerySet construction -- and they aren't going away -- this will be fixed as part of that. I'm reluctant to be tweakingQ()
objects much, since the refactoring is being worked on at the moment (although it will be post-0.96).
Removing the patch keywords, because I don't want to apply this patch (although thanks for writing it up anyway -- it looks reasonable). Let's leave
Q()
objects stable in the brief runup to 0.96 and make the big changes immediately afterwards.
comment:8 by , 18 years ago
There was a strange bug in the patch: In line 736, it called self.underneath_an_or()
, which is a non-callable attribute. Replacing it with self.search_for_or()
worked. Tests are passed, and all tests of my application also passed.
But the patch still helped a lot to get something done right now--thanks, axiak!
Bug #1801 (a problem with joins and AND) is unrelated, in case anybody wonders.
comment:9 by , 18 years ago
Cc: | added |
---|
comment:10 by , 18 years ago
Hmm ... I've played with the patch a bit, and I'm a bit disappointed. Outer joins seem too hard to optimize for the database. At least mysql 4.1 creates really stupid execution plans, and the queries are effectively unusable.
Perhaps this should be solved with UNION instead of OUTER JOIN.
follow-up: 13 comment:11 by , 18 years ago
I would love to use UNION. It is the right way to do it (it is logically what is meant by |). However, it seems I cannot do that without doing enough of a rewrite as to really merit a refactoring of QuerySet. This is meant to buy the current Q's some time...
What type data are you working with? I'm assuming you have indices for the columns you're joining on. If so, how many rows do you have that it's a problem?
Just curious--maybe there's a way to optimize this further?
comment:13 by , 18 years ago
Replying to axiak@mit.edu:
A problem with the patch is that it turns every join below the or node into an outer join. Nested joins, with an outer join at the or node and inner joins below, would make more sense. I don't know if it would help, though. The problem is with a table (basically, an association table) with a few 10k rows. From this table, I get an outer join to another table with a few 1000 rows. But it's too complex for demonstration.
There's a problem with union: To AND two unions, you'd need INTERSECT. But mysql 4.1 doesn't have INTERSECT :-(
I cannot explore this further, since I don't have more time for it. I'll resort to manual sql, once again.
As a general rule, it's hard generate useable SQL in complex cases, and it's a lot harder to support multiple databases with this. mysql isn't famous for optimizing complex queries, so solving this in the general case might really turn out to be impossible for this * "database".
Sorry that I cannot offer any more help :-(
comment:14 by , 18 years ago
Keywords: | qset_refactor added |
---|---|
Patch needs improvement: | set |
Needs a better patch for reasons in my posting above. This will probably have to wait for the queryset refactoring.
comment:16 by , 17 years ago
Keywords: | qs-rf-fixed added |
---|
comment:17 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [7477]) Merged the queryset-refactor branch into trunk.
This is a big internal change, but mostly backwards compatible with existing
code. Also adds a couple of new features.
Fixed #245, #1050, #1656, #1801, #2076, #2091, #2150, #2253, #2306, #2400, #2430, #2482, #2496, #2676, #2737, #2874, #2902, #2939, #3037, #3141, #3288, #3440, #3592, #3739, #4088, #4260, #4289, #4306, #4358, #4464, #4510, #4858, #5012, #5020, #5261, #5295, #5321, #5324, #5325, #5555, #5707, #5796, #5817, #5987, #6018, #6074, #6088, #6154, #6177, #6180, #6203, #6658
follow-up: 19 comment:18 by , 17 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
It appears that OR-ing Q objects across related tables still produces an INNER JOIN?
comment:19 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Replying to sime:
It appears that OR-ing Q objects across related tables still produces an INNER JOIN?
Please give an example of a query that results in an inner join when you think it should not, and reopen the ticket again.
comment:20 by , 17 years ago
Please don't reopen this ticket. Rather, pen a new ticket containing the short example that demonstrates the problem. The issue in this ticket has been fixed and if you're seeing something going strange with Q-object behaviour, that's a separate issue. Let's keep the histories nad comments for separate issues distinct, please.
Some more things:
I realized that the
correct way to do ORing is by use of UNION. For example, if I have three models:
X, Y, and Z (each with parameters a,b,c) and both Y and Z foreign key to X: Z --> X and Y --> X. Then:
(They are all in app `package')
Q(yaisnull = False) | Q(zbisnull = False) should yield the following query:
(approximate)
SELECT * FROM package_x
UNION
SELECT * FROM package_x
Is this clear?
Unfortunately, it would take some rewriting of django to even attempt this.