Opened 13 years ago

Closed 12 years ago

#17600 closed Bug (fixed)

Error in encapsulates filters (Q)

Reported by: Pablo Martín Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: anssi.kaariainen@…, Gabe Jackson Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If I have the next code:

   class A(models.Model):

       .....

    class B(models.Model):

        a = models.ManyToManyField(A)

The next queries get differents results:

    B.objects.exclude(a__in=[7])

    from django.db.models import Q
    B.objects.exclude(Q(a__in=[7]))

Results:

The first query get all objects excluding the "b objects" with a=7. It's Ok
But the second query get all objects excluding the "b objects" with a=7 or a=None.
Is it an error?, Is it known?

I add a verbose example, execute the next code

    from django.contrib.auth.models import User, Group
    u1 = User.objects.create(username='u1')
    u2 = User.objects.create(username='u2')
    u3 = User.objects.create(username='u3')
    g1 = Group.objects.create(name='g1')
    g2 = Group.objects.create(name='g2')
    u1.groups.add(g1)
    u2.groups.add(g2)
    print User.objects.exclude(groups__in=[g1.pk])
    print User.objects.exclude(Q(groups__in=[g1.pk]))

I have seen the documentation and the tests and I didn't see any note or reference

Attachments (2)

exclude.r17463.diff (1.4 KB ) - added by Pablo Martín 13 years ago.
Add the patche
exclude-tests-ticket17600.diff (4.5 KB ) - added by Gabe Jackson 12 years ago.
Test Cases for Q, ~Q and exclude bugs

Download all attachments as: .zip

Change History (31)

comment:1 by Pablo Martín, 13 years ago

Triage Stage: UnreviewedDesign decision needed
Type: UncategorizedBug

comment:2 by Pablo Martín, 13 years ago

Obviously it is a nosense use Q when you only have one lookup. It's possible that the example would be more clarifier if you change the last query for this:

User.objects.exclude(Q(groups__in=[g1.pk])|Q(username='u2'))

This should return :

[<User: u3>]

But return:

[]

comment:3 by Pablo Martín, 13 years ago

I research the problem. And these queries are different:

User.objects.exclude(groups__in=[g1.pk])
User.objects.exclude(Q(groups__in=[g1.pk]))

But, however these are equals:

User.objects.exclude(groups__in=[g1.pk])
User.objects.filter(~Q(groups__in=[g1.pk]))
User.objects.exclude(~~Q(groups__in=[g1.pk]))

by Pablo Martín, 13 years ago

Attachment: exclude.r17463.diff added

Add the patche

comment:4 by Pablo Martín, 13 years ago

I am adding this patch.

The error is very difficult to explain. But the problem is when the [browser:django/trunk/django/db/models/query.py#L637 Q node is negated automatically], because of an exclude, and this receives a Q node. Here is an example:

User.objects.exclude(Q(groups__in=[g1.pk]))

The problem is in the [browser:django/trunk/django/db/models/sql/query.py#L1252 code]. In this line {{ q_object.negated }} is False but it should be True . The parent object of {{ q_object }} is the one that is negated.

If you have something like this:

User.objects.exclude(~~Q(groups__in=[g1.pk]))

This works because the last Q node is negated.

comment:5 by Pablo Martín, 13 years ago

Has patch: set

comment:6 by Fidel Ramos, 13 years ago

Have you brought this up in django-developers?

comment:7 by Anssi Kääriäinen, 13 years ago

Cc: anssi.kaariainen@… added

It seems wrong that you need to do the negate_last_node thing (a better name: negate_leaf_nodes?). I think there are some deeper bugs to fix here. Mainly, it seems the query code works only if the leaf nodes of the tree are negated. It should work with any kind of boolean tree, or it should be documented and asserted in the code that only leaf-negated trees are allowed. In my opinion fixing the immediate issue is not the right thing to do here.

in reply to:  7 comment:8 by Pablo Martín, 13 years ago

Replying to akaariai:

It seems wrong that you need to do the negate_last_node thing (a better name: negate_leaf_nodes?).

Of course, I didn't explain. My patch is a temporal solution to me, and something for you to understand better the problem, only this.

At reference, your "better name" is wrong. These nodes are not leaf. They have a children, but these are not instance of tree.Node. But it is true, I forgot a "s".

I think there are some deeper bugs to fix here. Mainly, it seems the query code works only if the leaf nodes of the tree are negated. It should work with any kind of boolean tree, or it should be documented and asserted in the code that only leaf-negated trees are allowed. In my opinion fixing the immediate issue is not the right thing to do here.

I think this too. I think that to fix this ticket we need someone core developer.

in reply to:  6 comment:9 by Pablo Martín, 13 years ago

Replying to framos:

Have you brought this up in django-developers?

The monday I send a e-mail to django-developers.

comment:10 by Shai Berger, 13 years ago

Has patch: unset

If your patch is not intended as a fix to the problem, I think it is better to remove the "has patch" mark -- and so I did. Feel free to correct me, of course.

I think the best step forward, short of a core developer getting involved, is to add a patch that just has tests to demonstrate the problem.

comment:11 by Anssi Kääriäinen, 13 years ago

I think this one should rest until 1.4 is out. Fixing this before that seems unlikely. Or, if this is fixed, the fix must be something like what is done by pmartin.

I think the real fix here would include some refactoring of the WhereNode class (mainly, there is something weird going on with the .negate() and start_subtree() business). Then refactor sql/query.py add_q (it creates pretty stupid boolean trees on certain inputs), and finally check what broke and how to _then_ fix the issue in this ticket.

There are some related tickets already: #17025 is about where node refactoring. It maybe goes a bit too far, but I think it shows the potential of code cleanup and speed improvement available. Then there is #17000 which is about add_q refactoring.

in reply to:  11 comment:12 by Pablo Martín, 13 years ago

Replying to akaariai:

I think the same. Thanks akaariai, your comment is perfect. Good work!

I sent this ticket to django-developers, to try that a core developer are interested. Subject: "A important bug in queryset"

comment:13 by Gabe Jackson, 12 years ago

I added some tests cases to demonstrate these issues. I ran in to the same issue today as I was looking to make a query to return Orders having OrderItems with a certain status only.

Perhaps somebody could check the test cases.

by Gabe Jackson, 12 years ago

Test Cases for Q, ~Q and exclude bugs

comment:14 by Gabe Jackson, 12 years ago

Cc: Gabe Jackson added

comment:15 by Anssi Kääriäinen, 12 years ago

Has patch: set
Triage Stage: Design decision neededAccepted

I believe the code in this branch could fix this ticket: https://github.com/akaariai/django/tree/refactor_utils_tree

While the branch is named refactor_utils_tree it actually refactors query.qs.add_q() too. The branch contains some test cases, but it would be interesting to see how the branch works with your tests. If you are able to do the testing that would be valuable. If not, I will try to find time to do it myself.

Im marking this as accepted - to me it is obvious there is a bug here, and one which is fixable with reasonable amount of work.

comment:16 by Anssi Kääriäinen, 12 years ago

I'm trying to check if the refactor_utils_tree solves this issue.

To me it seems that test_only_orders_with_all_items_having_status_1() is bogus - it should return both Order: 1 and Order: 2. I can't see where the restriction is done that _all_ items must have status 1?

The two tests doing assertEqual(qs1, qs2) do not work correctly - while the querysets do not implement __eq__ in a way that would be usable here, so the correct test is assertEqual(list(qs1), list(qs2)). If this is done, these tests pass.

Apart of that all tests pass, and I think the patch actually solves this issue. I am wondering if these tests should be added to the patch after the above corrections, and if modeltests/excludes is the right place. I might put them into regressiontests/queries, as that is what I usually run, and contains a lot of similar stuff already.

comment:17 by anonymous, 12 years ago

perfect. I agree with your comments on the test cases. they also don't need to be merged if they don't test anything not already in the queries test cases. One question still remains though, I was in fact trying to use the the exclude and ~Q together to enforce that all items must have status 1. This doesn't seem possible without raw sql? If it is, then we should definitely have 1) A test-case for it 2) Documentation. I'll write the Documentation if you are aware how this should work with the ORM.

comment:18 by Gabe Jackson, 12 years ago

sorry, should have logged in.

comment:19 by Anssi Kääriäinen, 12 years ago

Hmmh, seems like my interpretation of the .exclude(~Q(items__status=1)) was incorrect. It seems this should behave similarly to qs.exclude(items__in=Item.objects.filter(~Q(status=1))). That is, remove every Order having at least one item with different status than 1. And this again is equivalent to having all items status as 1 (including the case of no items at all).

The query the refactor_utils_tree generates for the exclude(~Q()) is this: WHERE NOT (NOT (id IN (SELECT order_id .. WHERE status = 1)), while the exclude(items__in) generates WHERE NOT (id IN (SELECT order_id .. WHERE NOT status = 1 AND order_id IS NOT NULL)). Thus, the second NOT should be pushed down to the subquery.

I don't see this as a must fix issue for refactor_utils_tree, but I will check if this happens to be easy to fix. Another opinion if the interpretation of the nested exclude is correct would be welcome. These things tend to get somewhat hard to parse...

comment:20 by anonymous, 12 years ago

It seems this isn't easy to fix.

First, my previous comment is bogus - the items__in query actually generates two subqueries, the generated queries are:

the items__in one:
SELECT DISTINCT "excludes_order"."id" FROM "excludes_order"
WHERE NOT ("excludes_order"."id" IN 
    (SELECT U1."order_id" FROM "excludes_orderitem" U1 WHERE (U1."id" IN 
        (SELECT U0."id" FROM "excludes_orderitem" U0 WHERE NOT (U0."status" = 1 ))
     AND U1."order_id" IS NOT NULL)))
ORDER BY "excludes_order"."id" ASC

and the erroneous current one:
SELECT DISTINCT "excludes_order"."id" FROM "excludes_order" WHERE NOT (NOT ("excludes_order"."id" IN 
    (SELECT U1."order_id" FROM "excludes_orderitem" U1 WHERE (U1."status" = 1  AND U1."order_id" IS NOT NULL))))
ORDER BY "excludes_order"."id" ASC

It is somewhat hard to generate the correct query, as when the need for the subquery is seen, the NOT is already added to the tree. This will be easier to handle if the add_q logic is arranged in a way where the lookups are checked for multijoins before beginning to add them into the query, so that one can push the NOT into the subquery. Still, I don't believe this to be easy to fix even then, as the subquery generation needs added logic to support nested subqueries.

So, I think this test should be added to regressiontests/queries as an @expectedFailure.

comment:21 by mtayseer, 12 years ago

I found a workaround to make the ORM generate the correct query, by negating all the Q objects. This is my original (buggy) code

nearby = Q(route__isnull=True, stations__name=station.name)
if station.place:
    nearby |= Q(route__isnull=False, route__dwithin=(station.place, MAX_WALKING_DISTANCE))
self.exclude(nearby).distinct()

and this is the code after the change

nearby = ~~Q(route__isnull=True, stations__name=station.name)
if station.place:
    nearby |= ~~Q(route__isnull=False, route__dwithin=(station.place, MAX_WALKING_DISTANCE))
self.exclude(nearby).distinct()

in reply to:  21 comment:22 by Pablo Martín, 12 years ago

Replying to mtayseer:

I found a workaround to make the ORM generate the correct query, by negating all the Q objects. This is my original (buggy) code

I said this in the comment 4. Please read every comment

comment:23 by Anssi Kääriäinen, 12 years ago

The https://github.com/akaariai/django/tree/refactor_utils_tree branch contains now the tests for this ticket. Everything works, except the double-subquery tests, which is added as an expected failure into the suite.

comment:24 by Pablo Martín, 12 years ago

With this branch works!

Version 0, edited 12 years ago by Pablo Martín (next)

comment:25 by Anssi Kääriäinen, 12 years ago

Triage Stage: AcceptedReady for checkin

I have rebased the https://github.com/akaariai/django/tree/refactor_utils_tree and done some additions, too. The biggest one is that we now have build_filter() instead of add_filter() which will create a WhereNode object ready to be added to the tree. By doing this the wherenode/havingnode additions suddenly happen naturally in the _add_q without the need to pass the target clause down to add_filter. I like this way.

There is also a new add_filter() which is just self.where.add(self.build_filter(filterexpr), AND) - that is, a little helper which is used in various places of the ORM.

comment:26 by Anssi Kääriäinen, 12 years ago

#13198 is related (if not duplicate). Tests from that ticket should be added at least. I reopened that ticket so that adding the tests will not be forgotten.

comment:27 by Anssi Kääriäinen, 12 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

The patch isn't good enough yet. Mainly, the in_negated flag for build_filter() doesn't work properly. It is used for two things, first for answering if the filter needs subquery, and in this case the answer is yes if the three contains negated=True at any point, and also for IS NULL checking, and in this case the answer comes from the current state of the negation, that is the IS NULL check is needed only if the path contains odd amount of NOTs, otherwise they will cancel each other.

In addition I have been playing back-and-forth with add_filter() and build_filter(). The build_filter() naming isn't good, it alters the query's joins but the name suggests it just builds the filter condition. So, for that reason too this patch needs still improvement.

comment:28 by Anssi Kääriäinen, 12 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

I have added another version of the patch, available from https://github.com/akaariai/django/tree/refactor_utils_tree_2.

The add_q -> _add_q + all the code related to doing HAVING splits is still ugly... But it is also working more correctly than current code, and fixing at least 5 tickets.

The underlying problem is that the logic *is* somewhat ugly. The having splits must be done for any subtree starting with OR and containing an aggregate somewhere in the subtree. Also, the negated status of the tree must be appended to the subtree. Still, the references to the aggregates can be in F() expressions, and these nest, too...

The patches approach is to split the having parts before adding the filters. The current approach is to build the having and the where trees in parallel, but that doesn't work nicely. There is still an approach where the having parts are split out in compiler stage, and that might work nicely. Other ideas are welcome...

So, I think I will just commit the code and then see if there is some way to make the changes still cleaner.

comment:29 by Anssi Kääriäinen, 12 years ago

Resolution: fixed
Status: newclosed

Seems like Trac didn't pick up the commit. So, this was fixed in d3f00bd5706b35961390d3814dd7e322ead3a9a3.

The code is still somewhat ugly (the Q() split to HAVING and WHERE parts in particular). I am playing around with the idea of doing the WHERE/HAVING split afterwards, at compile time. This might be cleaner.

Note: See TracTickets for help on using tickets.
Back to Top