Opened 13 years ago

Closed 13 years ago

#15009 closed (invalid)

Generated sql for "or"-ing two filters over multiple joins is wrong

Reported by: Saverio Trioni <strioni@…> Owned by: nobody
Component: Database layer (models, ORM) Version: 1.2
Severity: Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In the following data model:

FirstRelated  <- m2m -> BaseModel <- m2m -> SecondRelated <- fk <- ThirdRelated

I am doing a filter on the BaseModel with an "or" between:

  • a condition on the ThirdRelated
  • a condition on the FirstRelated

as BaseModel.objects.filter(Q(seconds__thirds__condition=True) | Q(firsts__condition=True)).

Each of the two filters behaves correctly when executed, but when "or"-ed some records are returned more than once.

The problem is related to the generated SQL, because 'or'-ing two conditions make the joins optional, but then if a condition fails a record can be returned because of the other condition.

The solution probably would be to include a DISTINCT() clause when 'or'-ing the conditions.

Here (and attached, too) you have the models and the tests for a failing example. I leave out the wrong_or/settings.py because they are all standard.

File wrong_or/wrong/models.py:

from django.db import models


class First(models.Model):
    condition = models.BooleanField()


class Second(models.Model):
    pass


class BaseModel(models.Model):
    firsts = models.ManyToManyField(First)  
    seconds = models.ManyToManyField(Second)
    

class Third(models.Model):
    second = models.ForeignKey(Second, related_name='thirds')
    condition = models.BooleanField()

File wrong_or/wrong/tests.py:

from django.db import models
from django.test import TestCase
from wrong_or.wrong.models import BaseModel, First, Second, Third

class SimpleTest(TestCase):
    def test_wrong(self):
        
        bo = BaseModel.objects.create()
        
        f = First.objects.create(condition=True)
        bo.firsts.add(f)
        
        s1 = Second.objects.create()
        t1 = Third.objects.create(second=s1, condition=False)
        s2 = Second.objects.create()
        t2 = Third.objects.create(second=s2, condition=False)
        
        bo.seconds.add(s1, s2)
        
        q1 = models.Q(firsts__condition=True) # match the only BaseModel
        q2 = models.Q(seconds__thirds__condition=True) # match nothing
        
        self.assertTrue(BaseModel.objects.filter(q1).exists())
        self.assertEqual(1, BaseModel.objects.filter(q1).count())
        self.assertFalse(BaseModel.objects.filter(q2).exists())
        
        # A filter should not duplicate records.
        # This assertion will fail, because there are two copies of 
        # the 'bo' record.
        self.assertEqual(1, BaseModel.objects.filter(q1|q2).count())
        

Attachments (2)

models.py (392 bytes ) - added by Saverio Trioni <strioni@…> 13 years ago.
tests.py (1.1 KB ) - added by Saverio Trioni <strioni@…> 13 years ago.

Download all attachments as: .zip

Change History (5)

by Saverio Trioni <strioni@…>, 13 years ago

Attachment: models.py added

by Saverio Trioni <strioni@…>, 13 years ago

Attachment: tests.py added

comment:1 by Russell Keith-Magee, 13 years ago

I'm going to mark this wontfix. You're correct that results will be returned more than once, but that's a leaky abstraction born of the relational nature of SQL. There are many places and queries where you can get duplicate results -- all you need to do is traverse more than one join. Although your dataset evidently doesn't reveal this, each of your standalone queries would be prone to duplicate results, under the right conditions.

comment:2 by Saverio Trioni <strioni@…>, 13 years ago

Ok, you are right. I always try to read fully the documentation, but a simple search today showed me this:

http://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct

... (Could you mark this as RTFM???)

in reply to:  2 comment:3 by Karen Tracey, 13 years ago

Resolution: invalid
Status: newclosed

Replying to Saverio Trioni <strioni@flumotion.com>:

... (Could you mark this as RTFM???)


Invalid is as close as we have to that.

Note: See TracTickets for help on using tickets.
Back to Top