Code

Opened 2 years ago

Closed 13 months ago

#17600 closed Bug (fixed)

Error in encapsulates filters (Q)

Reported by: pmartin Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: anssi.kaariainen@…, gabejackson Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If I have the next code:

   class A(models.Model):

       .....

    class B(models.Model):

        a = models.ManyToManyField(A)

The next queries get differents results:

    B.objects.exclude(a__in=[7])

    from django.db.models import Q
    B.objects.exclude(Q(a__in=[7]))

Results:

The first query get all objects excluding the "b objects" with a=7. It's Ok
But the second query get all objects excluding the "b objects" with a=7 or a=None.
Is it an error?, Is it known?

I add a verbose example, execute the next code

    from django.contrib.auth.models import User, Group
    u1 = User.objects.create(username='u1')
    u2 = User.objects.create(username='u2')
    u3 = User.objects.create(username='u3')
    g1 = Group.objects.create(name='g1')
    g2 = Group.objects.create(name='g2')
    u1.groups.add(g1)
    u2.groups.add(g2)
    print User.objects.exclude(groups__in=[g1.pk])
    print User.objects.exclude(Q(groups__in=[g1.pk]))

I have seen the documentation and the tests and I didn't see any note or reference

Attachments (2)

exclude.r17463.diff (1.4 KB) - added by pmartin 2 years ago.
Add the patche
exclude-tests-ticket17600.diff (4.5 KB) - added by gabejackson 23 months ago.
Test Cases for Q, ~Q and exclude bugs

Download all attachments as: .zip

Change History (31)

comment:1 Changed 2 years ago by pmartin

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed
  • Type changed from Uncategorized to Bug

comment:2 Changed 2 years ago by pmartin

Obviously it is a nosense use Q when you only have one lookup. It's possible that the example would be more clarifier if you change the last query for this:

User.objects.exclude(Q(groups__in=[g1.pk])|Q(username='u2'))

This should return :

[<User: u3>]

But return:

[]

comment:3 Changed 2 years ago by pmartin

I research the problem. And these queries are different:

User.objects.exclude(groups__in=[g1.pk])
User.objects.exclude(Q(groups__in=[g1.pk]))

But, however these are equals:

User.objects.exclude(groups__in=[g1.pk])
User.objects.filter(~Q(groups__in=[g1.pk]))
User.objects.exclude(~~Q(groups__in=[g1.pk]))

Changed 2 years ago by pmartin

Add the patche

comment:4 Changed 2 years ago by pmartin

I am adding this patch.

The error is very difficult to explain. But the problem is when the [browser:django/trunk/django/db/models/query.py#L637 Q node is negated automatically], because of an exclude, and this receives a Q node. Here is an example:

User.objects.exclude(Q(groups__in=[g1.pk]))

The problem is in the [browser:django/trunk/django/db/models/sql/query.py#L1252 code]. In this line {{ q_object.negated }} is False but it should be True . The parent object of {{ q_object }} is the one that is negated.

If you have something like this:

User.objects.exclude(~~Q(groups__in=[g1.pk]))

This works because the last Q node is negated.

comment:5 Changed 2 years ago by pmartin

  • Has patch set

comment:6 follow-up: Changed 2 years ago by framos

Have you brought this up in django-developers?

comment:7 follow-up: Changed 2 years ago by akaariai

  • Cc anssi.kaariainen@… added

It seems wrong that you need to do the negate_last_node thing (a better name: negate_leaf_nodes?). I think there are some deeper bugs to fix here. Mainly, it seems the query code works only if the leaf nodes of the tree are negated. It should work with any kind of boolean tree, or it should be documented and asserted in the code that only leaf-negated trees are allowed. In my opinion fixing the immediate issue is not the right thing to do here.

comment:8 in reply to: ↑ 7 Changed 2 years ago by pmartin

Replying to akaariai:

It seems wrong that you need to do the negate_last_node thing (a better name: negate_leaf_nodes?).

Of course, I didn't explain. My patch is a temporal solution to me, and something for you to understand better the problem, only this.

At reference, your "better name" is wrong. These nodes are not leaf. They have a children, but these are not instance of tree.Node. But it is true, I forgot a "s".

I think there are some deeper bugs to fix here. Mainly, it seems the query code works only if the leaf nodes of the tree are negated. It should work with any kind of boolean tree, or it should be documented and asserted in the code that only leaf-negated trees are allowed. In my opinion fixing the immediate issue is not the right thing to do here.

I think this too. I think that to fix this ticket we need someone core developer.

comment:9 in reply to: ↑ 6 Changed 2 years ago by pmartin

Replying to framos:

Have you brought this up in django-developers?

The monday I send a e-mail to django-developers.

comment:10 Changed 2 years ago by shai

  • Has patch unset

If your patch is not intended as a fix to the problem, I think it is better to remove the "has patch" mark -- and so I did. Feel free to correct me, of course.

I think the best step forward, short of a core developer getting involved, is to add a patch that just has tests to demonstrate the problem.

comment:11 follow-up: Changed 2 years ago by akaariai

I think this one should rest until 1.4 is out. Fixing this before that seems unlikely. Or, if this is fixed, the fix must be something like what is done by pmartin.

I think the real fix here would include some refactoring of the WhereNode class (mainly, there is something weird going on with the .negate() and start_subtree() business). Then refactor sql/query.py add_q (it creates pretty stupid boolean trees on certain inputs), and finally check what broke and how to _then_ fix the issue in this ticket.

There are some related tickets already: #17025 is about where node refactoring. It maybe goes a bit too far, but I think it shows the potential of code cleanup and speed improvement available. Then there is #17000 which is about add_q refactoring.

comment:12 in reply to: ↑ 11 Changed 2 years ago by pmartin

Replying to akaariai:

I think the same. Thanks akaariai, your comment is perfect. Good work!

I sent this ticket to django-developers, to try that a core developer are interested. Subject: "A important bug in queryset"

comment:13 Changed 23 months ago by gabejackson

I added some tests cases to demonstrate these issues. I ran in to the same issue today as I was looking to make a query to return Orders having OrderItems with a certain status only.

Perhaps somebody could check the test cases.

Changed 23 months ago by gabejackson

Test Cases for Q, ~Q and exclude bugs

comment:14 Changed 23 months ago by gabejackson

  • Cc gabejackson added

comment:15 Changed 23 months ago by akaariai

  • Has patch set
  • Triage Stage changed from Design decision needed to Accepted

I believe the code in this branch could fix this ticket: https://github.com/akaariai/django/tree/refactor_utils_tree

While the branch is named refactor_utils_tree it actually refactors query.qs.add_q() too. The branch contains some test cases, but it would be interesting to see how the branch works with your tests. If you are able to do the testing that would be valuable. If not, I will try to find time to do it myself.

Im marking this as accepted - to me it is obvious there is a bug here, and one which is fixable with reasonable amount of work.

comment:16 Changed 22 months ago by akaariai

I'm trying to check if the refactor_utils_tree solves this issue.

To me it seems that test_only_orders_with_all_items_having_status_1() is bogus - it should return both Order: 1 and Order: 2. I can't see where the restriction is done that _all_ items must have status 1?

The two tests doing assertEqual(qs1, qs2) do not work correctly - while the querysets do not implement __eq__ in a way that would be usable here, so the correct test is assertEqual(list(qs1), list(qs2)). If this is done, these tests pass.

Apart of that all tests pass, and I think the patch actually solves this issue. I am wondering if these tests should be added to the patch after the above corrections, and if modeltests/excludes is the right place. I might put them into regressiontests/queries, as that is what I usually run, and contains a lot of similar stuff already.

comment:17 Changed 22 months ago by anonymous

perfect. I agree with your comments on the test cases. they also don't need to be merged if they don't test anything not already in the queries test cases. One question still remains though, I was in fact trying to use the the exclude and ~Q together to enforce that all items must have status 1. This doesn't seem possible without raw sql? If it is, then we should definitely have 1) A test-case for it 2) Documentation. I'll write the Documentation if you are aware how this should work with the ORM.

comment:18 Changed 22 months ago by gabejackson

sorry, should have logged in.

comment:19 Changed 22 months ago by akaariai

Hmmh, seems like my interpretation of the .exclude(~Q(items__status=1)) was incorrect. It seems this should behave similarly to qs.exclude(items__in=Item.objects.filter(~Q(status=1))). That is, remove every Order having at least one item with different status than 1. And this again is equivalent to having all items status as 1 (including the case of no items at all).

The query the refactor_utils_tree generates for the exclude(~Q()) is this: WHERE NOT (NOT (id IN (SELECT order_id .. WHERE status = 1)), while the exclude(items__in) generates WHERE NOT (id IN (SELECT order_id .. WHERE NOT status = 1 AND order_id IS NOT NULL)). Thus, the second NOT should be pushed down to the subquery.

I don't see this as a must fix issue for refactor_utils_tree, but I will check if this happens to be easy to fix. Another opinion if the interpretation of the nested exclude is correct would be welcome. These things tend to get somewhat hard to parse...

comment:20 Changed 22 months ago by anonymous

It seems this isn't easy to fix.

First, my previous comment is bogus - the items__in query actually generates two subqueries, the generated queries are:

the items__in one:
SELECT DISTINCT "excludes_order"."id" FROM "excludes_order"
WHERE NOT ("excludes_order"."id" IN 
    (SELECT U1."order_id" FROM "excludes_orderitem" U1 WHERE (U1."id" IN 
        (SELECT U0."id" FROM "excludes_orderitem" U0 WHERE NOT (U0."status" = 1 ))
     AND U1."order_id" IS NOT NULL)))
ORDER BY "excludes_order"."id" ASC

and the erroneous current one:
SELECT DISTINCT "excludes_order"."id" FROM "excludes_order" WHERE NOT (NOT ("excludes_order"."id" IN 
    (SELECT U1."order_id" FROM "excludes_orderitem" U1 WHERE (U1."status" = 1  AND U1."order_id" IS NOT NULL))))
ORDER BY "excludes_order"."id" ASC

It is somewhat hard to generate the correct query, as when the need for the subquery is seen, the NOT is already added to the tree. This will be easier to handle if the add_q logic is arranged in a way where the lookups are checked for multijoins before beginning to add them into the query, so that one can push the NOT into the subquery. Still, I don't believe this to be easy to fix even then, as the subquery generation needs added logic to support nested subqueries.

So, I think this test should be added to regressiontests/queries as an @expectedFailure.

comment:21 follow-up: Changed 22 months ago by mtayseer

I found a workaround to make the ORM generate the correct query, by negating all the Q objects. This is my original (buggy) code

nearby = Q(route__isnull=True, stations__name=station.name)
if station.place:
    nearby |= Q(route__isnull=False, route__dwithin=(station.place, MAX_WALKING_DISTANCE))
self.exclude(nearby).distinct()

and this is the code after the change

nearby = ~~Q(route__isnull=True, stations__name=station.name)
if station.place:
    nearby |= ~~Q(route__isnull=False, route__dwithin=(station.place, MAX_WALKING_DISTANCE))
self.exclude(nearby).distinct()

comment:22 in reply to: ↑ 21 Changed 22 months ago by pmartin

Replying to mtayseer:

I found a workaround to make the ORM generate the correct query, by negating all the Q objects. This is my original (buggy) code

I said this in the comment 4. Please read every comment

comment:23 Changed 21 months ago by akaariai

The https://github.com/akaariai/django/tree/refactor_utils_tree branch contains now the tests for this ticket. Everything works, except the double-subquery tests, which is added as an expected failure into the suite.

comment:24 Changed 21 months ago by pmartin

With this branch works!

Version 0, edited 21 months ago by pmartin (next)

comment:25 Changed 15 months ago by akaariai

  • Triage Stage changed from Accepted to Ready for checkin

I have rebased the https://github.com/akaariai/django/tree/refactor_utils_tree and done some additions, too. The biggest one is that we now have build_filter() instead of add_filter() which will create a WhereNode object ready to be added to the tree. By doing this the wherenode/havingnode additions suddenly happen naturally in the _add_q without the need to pass the target clause down to add_filter. I like this way.

There is also a new add_filter() which is just self.where.add(self.build_filter(filterexpr), AND) - that is, a little helper which is used in various places of the ORM.

comment:26 Changed 15 months ago by akaariai

#13198 is related (if not duplicate). Tests from that ticket should be added at least. I reopened that ticket so that adding the tests will not be forgotten.

comment:27 Changed 14 months ago by akaariai

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

The patch isn't good enough yet. Mainly, the in_negated flag for build_filter() doesn't work properly. It is used for two things, first for answering if the filter needs subquery, and in this case the answer is yes if the three contains negated=True at any point, and also for IS NULL checking, and in this case the answer comes from the current state of the negation, that is the IS NULL check is needed only if the path contains odd amount of NOTs, otherwise they will cancel each other.

In addition I have been playing back-and-forth with add_filter() and build_filter(). The build_filter() naming isn't good, it alters the query's joins but the name suggests it just builds the filter condition. So, for that reason too this patch needs still improvement.

comment:28 Changed 14 months ago by akaariai

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

I have added another version of the patch, available from https://github.com/akaariai/django/tree/refactor_utils_tree_2.

The add_q -> _add_q + all the code related to doing HAVING splits is still ugly... But it is also working more correctly than current code, and fixing at least 5 tickets.

The underlying problem is that the logic *is* somewhat ugly. The having splits must be done for any subtree starting with OR and containing an aggregate somewhere in the subtree. Also, the negated status of the tree must be appended to the subtree. Still, the references to the aggregates can be in F() expressions, and these nest, too...

The patches approach is to split the having parts before adding the filters. The current approach is to build the having and the where trees in parallel, but that doesn't work nicely. There is still an approach where the having parts are split out in compiler stage, and that might work nicely. Other ideas are welcome...

So, I think I will just commit the code and then see if there is some way to make the changes still cleaner.

comment:29 Changed 13 months ago by akaariai

  • Resolution set to fixed
  • Status changed from new to closed

Seems like Trac didn't pick up the commit. So, this was fixed in d3f00bd5706b35961390d3814dd7e322ead3a9a3.

The code is still somewhat ugly (the Q() split to HAVING and WHERE parts in particular). I am playing around with the idea of doing the WHERE/HAVING split afterwards, at compile time. This might be cleaner.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.