Opened 15 years ago

Closed 11 years ago

Last modified 11 years ago

#10790 closed Bug (fixed)

Too many joins in a comparison for NULL.

Reported by: Malcolm Tredinnick Owned by: Anssi Kääriäinen
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: wallenfe@…, Vlastimil Zíma, Dan Watson, milosu, philippe@…, tom@…, kmike84@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using these models:

class Student(models.Model):
    name = models.CharField(max_length=50)

class Message(models.Model):
    title = models.CharField(max_length=50)
    student = models.ForeignKey(Student, null=True)

this query generates slightly inefficient SQL:

Message.objects.filter(student=None)

We get this

SELECT `outer_message`.`id`, `outer_message`.`title`, `outer_message`.`student_id`
FROM `outer_message` 
   LEFT OUTER JOIN `outer_student` 
      ON (`outer_message`.`student_id` = `outer_student`.`id`)
WHERE `outer_student`.`id` IS NULL

when we should ideally be using this:

SELECT `outer_message`.`id`, `outer_message`.`title`, `outer_message`.`student_id`
FROM `outer_message`
WHERE `outer_message`.`student_id` IS NULL

Not worth worrying about for 1.1, since the result isn't incorrect; just less efficient than it could be.

Note to self for when fixing this: This is essentially the only situation where we can trim a left outer join from the end of a list of joins. We're comparing to NULL and can factor that back across the outer join. Needs special casing in Query.add_filter().

Attachments (7)

django1.1-ticket10790.patch (5.2 KB ) - added by milosu 14 years ago.
patch for django 1.1 passing django 1.1 regression test suite
django1.1-ticket10790v2.patch (9.1 KB ) - added by milosu 14 years ago.
improved patch with regression test and bug fix
107090v3.diff (9.2 KB ) - added by Daniel Roseman 13 years ago.
Updated patch which applies against trunk, fixes tabbing, and converts doctests to unit tests.
10790v4.diff (9.3 KB ) - added by Philippe Raoult 13 years ago.
updated for current trunk
ticket10790.diff (5.9 KB ) - added by Łukasz Rekucki 12 years ago.
Rebased patch.
ticket10790_v5.diff (13.0 KB ) - added by milosu 12 years ago.
ticket10790_v6.diff (18.5 KB ) - added by milosu 12 years ago.
elaborated patch and test case supporting more isnull use cases, passing Django test suite

Download all attachments as: .zip

Change History (60)

comment:1 by Malcolm Tredinnick, 15 years ago

Triage Stage: UnreviewedAccepted

comment:2 by wallenfe, 15 years ago

Cc: wallenfe@… added

This becomes a more severe problem when it is combined with an update. Due to the JOIN in the query, the update becomes two SQL statements rather than one. This opens the door to potential race conditions if the web server processes that are executing these statements end up interleaving them.

To continue your example:

s=Student.objects.get(pk=1)
Message.objects.filter(student=None).update(student=s)

We get this:

SELECT `outer_message`.`id`, `outer_message`.`title`, `outer_message`.`student_id`
FROM `outer_message` 
   LEFT OUTER JOIN `outer_student` 
      ON (`outer_message`.`student_id` = `outer_student`.`id`)
WHERE `outer_student`.`id` IS NULL

UPDATE `outer_message` SET `student_id` = 1 WHERE `outer_message`.`id` IN (1)'

when ideally we should get:

UPDATE `outer_message` SET `student_id` = 1 WHERE `outer_message`.`student_id` IS_NULL'

comment:3 by fas, 15 years ago

Workaround:
Instead of

Message.objects.filter(student=None)

use

Message.objects.exclude(student__isnull=False)

comment:4 by george.sakkis@…, 14 years ago

Cc: george.sakkis@… added

comment:5 by milosu, 14 years ago

The attached patch is passing all Django 1.1 test suite regressions tests.

by milosu, 14 years ago

Attachment: django1.1-ticket10790.patch added

patch for django 1.1 passing django 1.1 regression test suite

comment:6 by vrehak, 14 years ago

Has patch: set

by milosu, 14 years ago

improved patch with regression test and bug fix

in reply to:  3 comment:7 by Vlastimil Zíma, 14 years ago

Cc: Vlastimil Zíma added

comment:8 by Dan Watson, 14 years ago

Cc: Dan Watson added

comment:9 by milosu, 14 years ago

Cc: milosu added

by Daniel Roseman, 13 years ago

Attachment: 107090v3.diff added

Updated patch which applies against trunk, fixes tabbing, and converts doctests to unit tests.

comment:10 by Alex Gaynor, 13 years ago

The patch looks like it's probably correct (I'll need to think more to be sure), however I'm concerned about the tests, they poke at some pretty deep internals. I'd prefer if we tested this by just looking at the SQL (as some of the other tests like this do).

comment:11 by PhiR_42, 13 years ago

Cc: philippe@… added

comment:12 by PhiR_42, 13 years ago

The patch seems rather complicated.

It has been pointed out that this doesn't always happen, as filter(something__isnull = True) will trigger it while exclude(something__isnull = False) will not, though they are logically equivalent and should produce the same output.

After a bit of digging, I found this in query.py:

        if (lookup_type == 'isnull' and value is True and not negate and
                len(join_list) > 1):
            # If the comparison is against NULL, we may need to use some left
            # outer joins when creating the join chain. This is only done when
            # needed, as it's less efficient at the database level.
            self.promote_alias_chain(join_list)

Indeed this test is only triggered when value and !negate, but it ignores the !value and negate case. I don't understand the ramifications well enough to know how to fix the test but it seems like a better starting point than trimming the join afterwards.

comment:13 by PhiR_42, 13 years ago

After a more careful examination of the code and the docs it seems I spoke too fast. The patch seems really to be the right way. {{filter()}} and {{exclude()}} are indeed asymmetric on this case.

comment:14 by Thomas Schreiber, 13 years ago

Cc: tom@… added

comment:15 by Chris Beaven, 13 years ago

Severity: Normal
Type: Bug

comment:16 by patchhammer, 13 years ago

Easy pickings: unset
Patch needs improvement: set

107090v3.diff fails to apply cleanly on to trunk

comment:17 by Ramiro Morales, 13 years ago

Component: ORM aggregationDatabase layer (models, ORM)

by Philippe Raoult, 13 years ago

Attachment: 10790v4.diff added

updated for current trunk

comment:18 by Philippe Raoult, 13 years ago

Patch needs improvement: unset
UI/UX: unset

comment:19 by George Sakkis, 13 years ago

Cc: george.sakkis@… removed

by Łukasz Rekucki, 12 years ago

Attachment: ticket10790.diff added

Rebased patch.

comment:20 by Łukasz Rekucki, 12 years ago

Patch needs improvement: set

The patch didn't apply cleanly, so I had to rebase it. After that I get the following failure in tests:

======================================================================
FAIL: test_ticket14876 (regressiontests.queries.tests.Queries4Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/lrekucki/django/django_lqc/tests/regressiontests/queries/tests.py", line 994, in test_ticket14876
    self.assertEqual(str(q1.query), str(q2.query))
AssertionError: 

'SELECT "queries_report"."id", "queries_report"."name", "queries_report"."creator_id" FROM "queries_report" 
LEFT OUTER JOIN "queries_author" ON ("queries_report"."creator_id" = "queries_author"."num") 
LEFT OUTER JOIN "queries_extrainfo" ON ("queries_author"."extra_id" = "queries_extrainfo"."id") 
WHERE ("queries_report"."creator_id" IS NULL OR "queries_extrainfo"."info" = e1 )' 

!= 

'SELECT "queries_report"."id", "queries_report"."name", "queries_report"."creator_id" FROM "queries_report" 
INNER JOIN "queries_author" ON ("queries_report"."creator_id" = "queries_author"."num") 
LEFT OUTER JOIN "queries_extrainfo" ON ("queries_author"."extra_id" = "queries_extrainfo"."id") 
WHERE ("queries_report"."creator_id" IS NULL OR "queries_extrainfo"."info" = e1 )'

----------------------------------------------------------------------

The failing query is:

q1 = Report.objects.filter(Q(creator__isnull=True) | Q(creator__extra__info='e1'))
q2 = Report.objects.filter(Q(creator__isnull=True)) | Report.objects.filter(Q(creator__extra__info='e1'))
self.assertQuerysetEqual(q1, ["<Report: r1>", "<Report: r3>"])
self.assertEqual(str(q1.query), str(q2.query))

This is most likely, 'cause the patch turn a OUTER join into INNER join with expectation that trim_joins() will delete it, but it doesn't.

Last edited 12 years ago by Łukasz Rekucki (previous) (diff)

comment:21 by Mikhail Korobov, 12 years ago

Cc: kmike84@… added

comment:22 by milosu, 12 years ago

Patch needs improvement: unset

Hi,

being an author of the initial patch, which is already almost 2 years old, I have found a time to make a (hopefully final) patch for this feature.

I'm proposing the following v5 patch that solves the failures in "queries" test case mentioned above (there were 4 failures in total) and passes the Django-1.4b1 test suite.

Regards

--
Milos

by milosu, 12 years ago

Attachment: ticket10790_v5.diff added

by milosu, 12 years ago

Attachment: ticket10790_v6.diff added

elaborated patch and test case supporting more isnull use cases, passing Django test suite

comment:23 by famousactress@…, 12 years ago

I'd like to campaign for this patch being included in a 1.4 release. The reason is because something happened between 1.3 and 1.4 that made this problem wildly worse, and the patch solves the original issue and the new severity. I suppose the change might qualify as a new bug but since the fix appears to live here I figured I'd start here first.

My example involves three models (pseudocode ahead):

class AuditLog(models.Model):
  pass

class Person(models.Model):
  delete_log = models.ForeignKey(AuditLog)

class Document(models.Model)
  status = models.CharField()
  delete_log = models.ForeignKey(AuditLog)
  person = models.ForeignKey(Person)

The query I ran into is basically:

Document.objects.exclude(delete_log__isnull=False).filter(status='something', person__delete_log__isnull=True)

Prior to 1.4.x, this query would produce two joins: an inner join on Person (required), and a left join on ActionLog against Person.delete_log (not required). This is per the original description of this bug, since this clause isn't in an exclude. As of 1.4.x, the query produces three joins, the extra one being a left join against Document's delete_log. I have more complex examples of this condition where my queries leap from three joins to six or eight, obviously impacting performance dramatically.

The attached patch (v6) merges nicely onto the 1.4 branch, and I've confirmed that it fixes all of the unnecessary joins in my particular case... producing more efficient SQL in 1.3.x and prior, and _wildly_ more efficient SQL than 1.4.

This bug is currently keeping me from moving to 1.4. I also suspect many users are suffering from the original bug and this new worsening without being completely aware.. If your queries are relatively simple, you've got one join instead of none, or a couple joins instead of one.. in a lot of conditions that's likely doubling your query time, but often from very small to very small x 2.

My point though, is that fixing this problem is a quick way to make basically everyone's Django code run faster. It's a miracle weight-loss pill for null-check ORM code. I'd love to see this fix make it to a 1.4.x release soon, and happy to help in any way I can!

comment:24 by Anssi Kääriäinen, 12 years ago

The patch might be correct... The join promotion & trimming logic is so hard to follow that it is very, very hard to say :)

But, if the tests pass it is a good indication this patch is on the right track.

There are a couple of other patches floating around which concentrate on excludes & join promotion. I have to see if there is anything there which would be useful for this case.

In whole the join promotion & trimming code would need some better comments about why things are done, and this should be from high-level perspective. It is really hard to see what is happening currently... I wonder if there is some more general trick we need to allow easier logic for join pruning (like, deferring all the pruning to pre-sql stage.

I think pushing this to 1.4 is OK. A regression from 1.3 to 1.4 qualifies as a reason to push this into 1.4.

comment:25 by famousactress@…, 12 years ago

Awesome. If you find those other patches I'm happy to do the same testing with them that I did with this one and report back with results for the particular cases that I've run into.

comment:26 by Anssi Kääriäinen, 12 years ago

To get this committed to HEAD:

  • Provide more comments about what does a demoted join mean and when and why that is used.
  • Check the comments in sql/query.py, some of them seem to not be accurate.
  • Split the added tests to more than one method. The added method is 200+ lines.
  • There is one failing test in queries after I applied the patch to HEAD.

To get this committed to 1.4:

  • Get the patch committed to HEAD first.
  • Provide another patch to 1.4 (preferrably as a github branch).

Even if you do that all, I can't guarantee the inclusion of this patch. As said the logic in the query class is very complex, and currently I don't understand what the patch does. Without understanding the patch it is hard to say if this is the right way...

The HEAD-rebased patch can be found from: https://github.com/akaariai/django/tree/ticket_10790

comment:27 by Anssi Kääriäinen, 12 years ago

Patch needs improvement: set

Sorry, didn't see your previous comment before posting. The related tickets are #10790 and #17600. I already checked - they didn't deal with the problem in this ticket.

Looking at the added tests, it seems obvious something is broken. I hope somebody can enlighten me on the details of this ticket's patch. I am still wondering about the demoted_join and how it works...

comment:28 by Anssi Kääriäinen, 12 years ago

I did still a little more checking. And now I wonder about the whole trim_joins concept: Why do we first add a join to a parent table, but then immediately proceed to trim that join? Wouldn't it make a lot more sense to not produce the parent join at all... This would mean that the join trimming logic would need to go into setup_joins.

Of course, doing the above "little" change isn't this ticket's problem :)

comment:29 by anonymous, 12 years ago

Owner: changed from Malcolm Tredinnick to famousactress@…

Btw, the new failure is because the unit test expected a join that shouldn't have ever existed in the first place and the patch is causing us to generate better SQL. I'll try to understand the original intent of this particular test and come up with a way to modify it constructively so that it works in this awesome new world of not having joins we don't need :)

I'll be working on my fork in case anyone wants to join: https://github.com/phill-tornroth/django/commits/ticket_10790

comment:30 by famousactress@…, 12 years ago

Owner: changed from famousactress@… to Malcolm Tredinnick

Apologize.. Accidentally re-assigned instead of signing my name. Trac confuses me :)

comment:31 by Anssi Kääriäinen, 12 years ago

I have a strong feeling this patch is not ready at all. If I understand correctly the demoted join is just a hack: the join type is changed so that it can be trimmed by trim_joins. In addition, changing the null_comparison to False in the place where demote_alias is called is plain wrong - the comparison is still a null comparison. The comments are not accurate...

I have a feeling the correct way is to supply additional flag(s) to trim_joins to let it trim the left joins when needed. Even better would be to not create the joins at all.

comment:32 by famousactress@…, 12 years ago

Yeah, I'm almost done cleaning up the unit tests and commenting each of them, because I think they provide a pretty thorough set of cases that will be helpful with whatever solution we eventually arrive at. I'm also rewording the comment language in the tests to not talk about which joins should be trimmed, etc.. but rather just what we're testing and how the queries are expected to work. I'm also seeing at least one inconsistency with the way queries are produced that I don't think was introduced by this patch (will confirm), so I'll probably end up spinning out a separate ticket for that bit after confirming on the mailing list it's not intended behavior.

As for the patch itself, my limited understanding of how things are working matches yours. The demoted join stuff feels like a really strange way to go about this. It makes loads more sense to me to detect these conditions and treat them as simple field comparisons.. ie: don't create the joins in the first place.

Regardless, I think this ticket is really important to work on. The .exclude() hack works okay for simple cases pre-1.4, but most Django code I come across doesn't use it.. most people don't know about this bug, or the exclude() workaround... In 1.4 a lot of data access code gets a lot worse/slower. The patch, if severely imperfect, at least provides some evidence that we can do way better.

comment:33 by Anssi Kääriäinen, 12 years ago

I did some work on doing the removal beforehand. Some things broke, while fixing them more things broke and so on...

I do agree fixing this is important, getting efficient SQL is important. To me it seems one part of the problem is that once generated joins which are then trimmed away can be later on reused. Causes all sorts of problems. However, this is needed so that certain things work (run a query, pickle it, unpickle it, rerun it -> same query string generated). Complex stuff...

comment:34 by Anssi Kääriäinen, 12 years ago

I have now a working version of join trimming beforehand. The approach taken in setup_joins is this:

  1. Walk the lookup path, and turn it into "JoinPath" structures. The structures basically tell from which model we are coming from, and to which model we are going to, and using which fields. All different join types are converted to these simple join structures (m2m, generic relation joins etc).
  2. Trim non-necessary joins from the path. This is easy to do at this stage.
  3. Generate the actual joins. This again is pretty trivial, just call self.join() for the generated joins.

I think it would be possible to get totally rid of trim_joins(), but it seems to do some special trimming which the trim in setup_joins() can't do currently. The trimming logic of setup_joins() could be improved easily.

There were some complications related to generic relations. When fixing this I ended up removing the whole process_extra hack. Now the generic relations extra join restriction is appended directly into the join clause:

JOIN a on (a.id = b.object_id and b.content_type_id = xxx)

This should result in correctly working joins in .exclude() and LEFT OUTER JOIN cases, though I haven't added any tests.

In addition I had to make the m2m through models pickleable. I don't know if the approach used was correct, but at least it works.

Apart of the above there is one test failure. It is related to combining ORed queries. In short the combined query now uses LEFT JOIN for one join where INNER JOIN would be correct, and the reason for this is that pre-patch one join was added, then trimmed (so it was with refcount of 0 in the query), but now it is never even added to the query. I think I know how to fix this one, but haven't done so yet.

In total I think the patch is a major cleanup to the join logic, and it should allow further improvements if the names_to_path call would be done already in add_filter (or even add_q) stage.

However, there is very little chance this one can ever end up in 1.4. This is just too big of a patch. For 1.4 the correct approach will be to teach the trim_joins to just trim the extra joins.

The new patch can be found from: https://github.com/akaariai/django/tree/new_10790

comment:35 by Anssi Kääriäinen, 12 years ago

Still some more digging into this. The problem with the trim_joins as stands is that the information of which direction we are traveling the join is lost. If we are traveling along a foreign key, it is trimmable, if into the other direction, it isn't usually trimmable.

That information is available in setup_joins() and it should be added to the alias_map, or more likely it should be a parameter to qs.join(), and that would then add the info to the alias map. After that fixing trim_joins() to do the right thing should be easy.

The patch provided in the previous comment is in my opinion a clear improvement and should be applied to master. It first needs a solution to that annoying join promotion problem in ORed cases.

comment:36 by milosu, 12 years ago

Akaariai, could you please also check patch attached to ticket https://code.djangoproject.com/ticket/17886 with your latest patch from previous comment?

I think they are closely related.

comment:37 by Anssi Kääriäinen, 12 years ago

milosu: I added comment to #17886, I think you are on right track but the patch needs some polish.

For the idea of trimming joins before even adding them: it doesn't work. The reason is a query like this:

qs.filter(relatedf__isnull=True)

This will first create a LOUTER join to the relatedf. Then trim it away (that is, make its refcount 0). Now, later on if we need another join to relatedf we know to make it LOUTER, this could happen in queryset combining (qs|qs.filter(relatedf__somecol=someval)) for example.

So, we need to have the LOUTER trimmed join information available somewhere. Using my patch from above we could just move the join trimming to just after creating the joins, and pass the information about the need of LOUTER join to setup_joins. Now, we can immediately trim the non-needed joins, and in addition we have the information about LOUTER available. I tested this quickly and it works.

The question now is if we need trim_joins() at all. If we do, then it might be better to just pass the information about the direction to trim_joins and modify it to do the join trimming properly. If we don't need trim_joins anymore, then doing the trimming in setup_joins seems like the correct approach.

In any case the information about the need of having information about trimmed joins available needs to be documented in code.

comment:38 by famousactress@…, 12 years ago

I went through a similar exercise, attempted to move the logic into trim_joins and ran into lots of the same problems... trimmed joins sometimes need to 'come back' later on. I really like where this patch is going, but of course it's unfortunate that it's scope will disqualify it for Django 1.4. I'll try to put together a targeted unit test that proves the regression I found and maybe we'll find a simpler fix for the regression. Meanwhile, I've sent over a pull request with my unit test cleanup from the original patches. Hopefully it's useful: https://github.com/akaariai/django/pull/1

comment:39 by famousactress@…, 12 years ago

I wrote a test that proves my regression (passes on 1.3, fails on 1.4.. and by failing I mean produces new unnecessary queries). My git bisect proved the change that I was suspicious of, since these were some of the tests that quickly fail when I attempted to make trim_joins more aggressive in order to fix this ticket. The culprit was https://code.djangoproject.com/ticket/15316, which introduced the whole nonnull_check. I'll file a ticket and reference this one, in case there's hope of a surgical fix for 1.4.

comment:41 by Anssi Kääriäinen, 12 years ago

Now I got something which seems to be completely working. Work can be found from here. The commit history doesn't make much sense, I will need to rework that if this is going to be included.

The patch does various things:

  • total refactor or setup_joins(), in particular dupe avoidance is now gone (#18748).
  • as part of the above refactor there is a new names_to_path() method which turns a lookup to a fields path of that lookup. See the names_to_path() for more info. This should allow easier implementation of custom lookups for fields (#14263).
  • process_extra/extra_filters is now gone (used for generic relations). Instead the generic relation's extra join conditions are added directly to the join itself. This produces correct results under LEFT JOIN conditions.
  • trim_joins() is completely refactored.
  • the patch has the "remove trim" patch included (#18816).

The above means that add_filter(), trim_joins() and setup_joins() now have different signatures than before. These are private API, but I think a release note about this would be in place. Also the removal of extra_filters needs a note.

I am pretty sure this patch will also fix a dozen or so existing tickets on Trac.

I am not in any hurry in including this patch, even if I believe this is a clear improvement to the code, both in functionality and in clarity. I would like to get a green light from the community for this work before moving forward with this patch.

I haven't incorporated the tests cleanup from comment:38, I will need to do that of course.

comment:42 by Anssi Kääriäinen, 11 years ago

Owner: changed from Malcolm Tredinnick to Anssi Kääriäinen

I have update the work, it can be found from: https://github.com/akaariai/django/compare/ticket_18748...ticket_18748_comb

Note that the work is now based on patch found from #18748 - the reason is that I am very likely going to commit that patch soon, and because that patch and this one conflict I didn't want to resolve conflicts again after commit of #18748.

There is at least on thing I have to do: refactor the added patch for this ticket. It has way too much stuff in one test method. Otherwise I don't know of any issues in the patch and djangobench doesn't complain about performance.

So, I will likely try to get this patch into Django soon enough.

comment:43 by Anssi Kääriäinen, 11 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

Last call.

Patch available from https://github.com/akaariai/django/compare/ticket_10790_final

The patch is pretty big so I will do a full test suite run on all DBs during the night. Any reviews are welcome, and if more time is needed for a review please make a note and I will postpone the commit.

comment:44 by milosu, 11 years ago

Being the author of all the previous patches to this ticket, I have to say that (without a week of poking and testing) I can't understand your latest patch anymore.

Hope there are enough regression tests there.

I will make a deeper review of the new ORM code path in a year or two.

comment:45 by milosu, 11 years ago

Maybe one question regarding the latest patch:

Why setup_join_cache was removed?

Are performance implications of this removal well understood / measured somehow ?

comment:46 by Anssi Kääriäinen, 11 years ago

I removed the join_cache as I haven't been able to show it improves performance in any of the djangobench benchmarks. If needed it is fairly simple to re-add the cache.

What the patch does is that it splits setup_joins() into two parts. Currently we do "for each lookup in the given lookup path, resolve lookup part, immediately add the found joins to query". This is changed to "First resolve the whole lookup path. After resolve add the joins into the query".

The names_to_path() method does the resolving. It is somewhat complex, but the code is pretty much exactly what we had before in setup_joins() minus the actual join generation. As a result of the split the actual join setup is around 10 lines and trim_joins() is made cleaner, too.

While for this ticket this refactoring isn't the minimal change, the work done allows doing two further improvements:

  1. Moving of the names_to_path() call into the beginning of add_filter() - this in turn will allow resolving some more bugs (for example F() are currently double-added in multi_join exclude cases), and will also make custom lookups much easier to add (see #16187).
  2. Moving the names_to_path() method to model._meta. This will allow custom meta objects to alter the behaviour of join generation, and will also allow code outside sql/query.py to reuse the same resolution code (Admin for example has duplicate code for this).

So, the patch isn't about this ticket alone.

Reading the patch itself is nearly impossible. The diff generated just happens to interleave badly. After applying the patch the resulting code should be more readable than the old code. Is there some specific part of the resulting code which seems hard to understand? I would really appreciate if you could do a side-by-side read of the old and new setup_joins() + trim_joins(). If the old code is easier to understand than the new code then that is a problem. This is of course a possibility - it is easy for me to understand the code I just wrote...

This change is a big change, and big changes to ORM tend to be scary. In my opinion the change is needed. Now seems to be the perfect time to add this into master. 1.6 is more than six months away so we have plenty of time to find regressions and fix them.

comment:47 by milosu, 11 years ago

I've spent some time looking at the latest source code with this patch applied, but the changes are really very big for me to be able to make any usefull review (without spending a few weeks).

Anyway - the new code looks to be more readable than the old one, so if everything passes tests, I think you can commit it.

comment:48 by Anssi Kääriäinen, 11 years ago

Unfortunately the patch does have too much going on for one patch. I could rewrite it to smaller pieces, but I am not sure if the effort-reward ratio is there.

The changes aren't _that_ big. The patch changes around 250 lines of django core code, and a large part of those changes are reindent of the model._meta traversal code.

The tests did pass and I have a confident feeling that the patch is doing the right thing. However I am not going to push this forward today. I have been doing too many mistakes lately (#13781 did need a few tries, and I just did the equivalent of DELETE * FROM tbl in a production DB...). So, I will do something completely else for a couple of days and then revisit this ticket.

comment:49 by Anssi Kääriäinen, 11 years ago

Another final version... Biggest change is the addition of field.get_extra_join_sql(), if defined the return value will be added to the join clause. This is a more generic approach than what was done before - the earlier approach worked just for generic foreign keys content type restriction.

comment:50 by Anssi Kääriäinen, 11 years ago

Another final version - no major changes this time, just some minor cleanup. Patch at https://github.com/akaariai/django/compare/ticket_10790_final

While I know a lot of things that could still be improved, it is time to just commit this one. I am confident that the taken approach is correct - while there might be regressions those should be fixable.

comment:51 by Anssi Kääriäinen <akaariai@…>, 11 years ago

Resolution: fixed
Status: newclosed

In 69597e5bcc89aadafd1b76abf7efab30ee0b8b1a:

Fixed #10790 -- Refactored sql.Query.setup_joins()

This is a rather large refactoring. The "lookup traversal" code was
splitted out from the setup_joins. There is now names_to_path() method
which does the lookup traveling, the actual work of setup_joins() is
calling names_to_path() and then adding the joins found into the query.

As a side effect it was possible to remove the "process_extra"
functionality used by genric relations. This never worked for left
joins. Now the extra restriction is appended directly to the join
condition instead of the where clause.

To generate the extra condition we need to have the join field
available in the compiler. This has the side-effect that we need more
ugly code in Query.getstate and setstate as Field objects
aren't pickleable.

The join trimming code got a big change - now we trim all direct joins
and never trim reverse joins. This also fixes the problem in #10790
which was join trimming in null filter cases.

comment:52 by Anssi Kääriäinen, 11 years ago

Just a mention about the future changes which might make sense:

  • Move the path generation code (the ugly stuff in names_to_path()) to fields. There is little reason the ORM code needs to know the internal implementation of the fields (how the field.rel.to... stuff). If this is done in fields instead this will allow for cleaner code in sql/query.py and also custom field subclasses can easily customize the behaviour of the join setup.
  • Move the names_to_path() call to beginning of add_filter(). This allows fixing some bugs related to F() expression handling and also allows the removal of MultiJoin exception.
  • Generate the join condition fully by the field. This would make the ORM agnostic about multiple columns in the join, and it would be possible to generate completely custom join clause for cases like what is discussed in #373 comment 111.
  • Some cleanup to the model._meta get_* methods. It could be possible to reduce the amount of different field types and the different get_foo methods related to these - if the field would tell what type of relation it is, then we need to just get a handle to the field and we are done. How much we want to actually change the ._meta is a good question - if anything is semi-public API then it is model._meta.

comment:53 by Tim Graham, 11 years ago

I think this caused a regression with QuerySet.exclude and many to many fields, see #19837.

Note: See TracTickets for help on using tickets.
Back to Top