Code

Opened 5 years ago

Closed 14 months ago

Last modified 14 months ago

#10870 closed Bug (fixed)

Aggregates with joins ignore extra filters provided by setup_joins

Reported by: fas Owned by: fas
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: orm, aggregation, join, contenttypes, filter
Cc: mbencun@…, jdunck@…, michaelg@…, robin, coderanger, carljm, chrischambers, joeri@…, anssi.kaariainen@…, ris, alanjds, daevaorn@…, fhahn, herve Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

(In case the description that follows is not understandable, please ask me for a better explanation).

An example where extra filters in joins are used are reverse generic relations (http://docs.djangoproject.com/en/dev/ref/contrib/contenttypes/#reverse-generic-relations).

class TaggedItem(models.Model):

    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    content_object = generic.GenericForeignKey('content_type', 'object_id')


class Bookmark(models.Model):
    url = models.URLField()
    tags = generic.GenericRelation(TaggedItem)

By querying Bookmark.objects.filter(tags=some_tag) the generic relation adds extra filters to ensure the correct content type. Those extra filters however are ignored when using aggregates, which leads to wrong results.

For example, the following query is not correct because it does not check the content_type.

Bookmark.objects.aggregate(Count('tags'))

What this aggregate actually calculates per bookmark is not the amount of tags for this bookmark but the amount of tags for any model referenced by TaggedItem with the same object id.

The error is isolated in db/models/sql/query.py in the add_aggregate method:

            field, source, opts, join_list, last, _ = self.setup_joins(
                field_list, opts, self.get_initial_alias(), False)

            # Process the join chain to see if it can be trimmed                                                                                             
            col, _, join_list = self.trim_joins(source, join_list, last, False)

The last dummy element _ is normally named 'extra' in other uses of setup_joins and contains the extra filters needed to make it work. Adding self.add_filter(*extra) after setup_joins fixes this problem.

Attachments (4)

aggregation-generic.diff (4.7 KB) - added by Alex 5 years ago.
I don't think the solution you've suggested works entirely as with this setup it generates incorrect results for me(it returns 9 instead of 3).
aggregation-generic.2.diff (5.0 KB) - added by Alex 5 years ago.
And elbow grease solves all problems.
aggregation-generic.3.diff (5.3 KB) - added by Alex 5 years ago.
Added a test to show it doesn't work with annotate, for some reason it only returns books that have been tagged, which doesn't match the behavior of any other annotation, it's an annotation not a filter, I assume this is the result of using the wrong join type
aggregation-generic.4.diff (5.2 KB) - added by Alex 5 years ago.

Download all attachments as: .zip

Change History (33)

comment:1 Changed 5 years ago by Alex

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted
  • Version changed from 1.0 to SVN

I really doubt this is with 1.0 ;)

Changed 5 years ago by Alex

I don't think the solution you've suggested works entirely as with this setup it generates incorrect results for me(it returns 9 instead of 3).

Changed 5 years ago by Alex

And elbow grease solves all problems.

Changed 5 years ago by Alex

Added a test to show it doesn't work with annotate, for some reason it only returns books that have been tagged, which doesn't match the behavior of any other annotation, it's an annotation not a filter, I assume this is the result of using the wrong join type

comment:2 Changed 5 years ago by mtredinnick

This patch has a couple of problems, from looking at it quickly.

  1. There's a random comment on line 1436 that doesn't refer to any code. No code immediately follows it. It's also of the "set i to i+1" variety if it was intended to go before line 1448, so can be removed.
  2. The if-test on line 1448 can be removed. The extra parameter is always a list, possibly empty. So just iterate over it and if it's empty, nothing will be done.

Changed 5 years ago by Alex

comment:3 Changed 5 years ago by Alex

1 and 2 are dealt with, still the strangle filtering issue, which I can't quite figure out, since a LEFT OUTER JOIN is being preformed(I've left the debugging statement in there).

comment:4 Changed 5 years ago by fas

  • Cc mbencun@… added

comment:5 Changed 5 years ago by fas

Is this going in for 1.1? I think it should as it would otherwise leave a new feature with a quite serious bug.

comment:6 Changed 5 years ago by russellm

  • milestone set to 1.1

comment:7 Changed 5 years ago by Alex

I don't know that solving this for 1.1 will be easy. The annotate stuff has reveiled a subtle bug that will require some fairly invasive changed to fix. The explanation is here: http://searchoracle.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid41_gci1148316,00.html

comment:8 Changed 5 years ago by Alex

  • milestone changed from 1.1 to 1.2

Bumping from 1.1 because this will require significant changes to the Query class, because we need to add support for additional join conditions. Also, Jacob said to do this.

comment:9 Changed 5 years ago by jdunck

  • Cc jdunck@… added

comment:10 Changed 4 years ago by michaelg

  • Cc michaelg@… added

comment:11 Changed 4 years ago by russellm

  • milestone changed from 1.2 to 1.3

Not critical for 1.2.

comment:12 Changed 4 years ago by robin

  • Cc robin added

comment:13 Changed 4 years ago by coderanger

  • Cc coderanger added

comment:14 Changed 4 years ago by carljm

  • Cc carljm added

comment:15 Changed 4 years ago by Alex

So I've been thinking about this, and I'm wondering whether it perhaps makes sense to commit this patch, even with it excluding legitimate results. My reasoning: the current status is that aggregate() and annotate() both return bogus data, with the patch aggregate() returns correct data, and annotate() returns data that is valid but is *missing* results for items with no related data. This strikes me as a clear improvement, and even though the patch is obviously not completely correct, it is a first step to the more complicated completely correct implementation.

comment:16 Changed 3 years ago by chrischambers

  • Cc chrischambers added

comment:17 Changed 3 years ago by SmileyChris

  • Severity set to Normal
  • Type set to Bug

comment:18 Changed 3 years ago by joeri@…

  • Cc joeri@… added
  • Easy pickings unset
  • UI/UX unset

comment:19 Changed 3 years ago by akaariai

  • Cc anssi.kaariainen@… added
  • Has patch set
  • Needs tests set
  • Patch needs improvement set

I did some work to allow the contenttype lookup in left joins. This is useful in and of itself (you can do something like Book.objects.filter(notes__note__isnull=True), and it will correctly find all books without any notes (before it would not have found any books). The work allows also to generate correct results when annotating and aggregating.

There are some loose ends in the patch. First, the clone method of sql/query doesn't properly clone the extra_join_filters (the conditions need to be deep-copied if I am not mistaken). Second, there is some non-dryness in setup_joins. There is also general commenting and stuff like that that needs to be done. I don't know if the patch would break under more complex joins (the main problem is getting the extra_filter to the right join). So, more tests needed also.

Github branch available at: https://github.com/akaariai/django/tree/generic_relations_left_joins

comment:20 Changed 3 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

comment:21 Changed 2 years ago by ris

  • Cc ris added

comment:22 Changed 2 years ago by alanjds

  • Cc alanjds added

comment:23 Changed 2 years ago by alexkoshelev

  • Cc daevaorn@… added

comment:24 Changed 2 years ago by fhahn

  • Cc fhahn added

comment:25 Changed 14 months ago by herve

  • Cc herve added

comment:26 Changed 14 months ago by akaariai

Could somebody write a test case against master? I am pretty sure this one is already fixed in the ORM.

comment:27 Changed 14 months ago by fhahn

  • Needs tests unset
  • Patch needs improvement unset

It seems fixed in master.

I've adopted the test from the previous patch and opened a pull request: https://github.com/django/django/pull/722

comment:28 Changed 14 months ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 3e71368423b41c9418117328216e66f95cbaab03:

Fixed #10870 -- Added aggreation + generic reverse relation test

comment:29 Changed 14 months ago by akaariai

  • Component changed from ORM aggregation to Database layer (models, ORM)

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.