Opened 5 years ago

Closed 4 years ago

#15009 closed (invalid)

Generated sql for "or"-ing two filters over multiple joins is wrong

Reported by: Saverio Trioni <strioni@…> Owned by: nobody
Component: Database layer (models, ORM) Version: 1.2
Severity: Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

In the following data model:

FirstRelated  <- m2m -> BaseModel <- m2m -> SecondRelated <- fk <- ThirdRelated

I am doing a filter on the BaseModel with an "or" between:

  • a condition on the ThirdRelated
  • a condition on the FirstRelated

as BaseModel.objects.filter(Q(seconds__thirds__condition=True) | Q(firsts__condition=True)).

Each of the two filters behaves correctly when executed, but when "or"-ed some records are returned more than once.

The problem is related to the generated SQL, because 'or'-ing two conditions make the joins optional, but then if a condition fails a record can be returned because of the other condition.

The solution probably would be to include a DISTINCT() clause when 'or'-ing the conditions.

Here (and attached, too) you have the models and the tests for a failing example. I leave out the wrong_or/settings.py because they are all standard.

File wrong_or/wrong/models.py:

from django.db import models


class First(models.Model):
    condition = models.BooleanField()


class Second(models.Model):
    pass


class BaseModel(models.Model):
    firsts = models.ManyToManyField(First)  
    seconds = models.ManyToManyField(Second)
    

class Third(models.Model):
    second = models.ForeignKey(Second, related_name='thirds')
    condition = models.BooleanField()

File wrong_or/wrong/tests.py:

from django.db import models
from django.test import TestCase
from wrong_or.wrong.models import BaseModel, First, Second, Third

class SimpleTest(TestCase):
    def test_wrong(self):
        
        bo = BaseModel.objects.create()
        
        f = First.objects.create(condition=True)
        bo.firsts.add(f)
        
        s1 = Second.objects.create()
        t1 = Third.objects.create(second=s1, condition=False)
        s2 = Second.objects.create()
        t2 = Third.objects.create(second=s2, condition=False)
        
        bo.seconds.add(s1, s2)
        
        q1 = models.Q(firsts__condition=True) # match the only BaseModel
        q2 = models.Q(seconds__thirds__condition=True) # match nothing
        
        self.assertTrue(BaseModel.objects.filter(q1).exists())
        self.assertEqual(1, BaseModel.objects.filter(q1).count())
        self.assertFalse(BaseModel.objects.filter(q2).exists())
        
        # A filter should not duplicate records.
        # This assertion will fail, because there are two copies of 
        # the 'bo' record.
        self.assertEqual(1, BaseModel.objects.filter(q1|q2).count())
        

Attachments (2)

models.py (392 bytes) - added by Saverio Trioni <strioni@…> 5 years ago.
tests.py (1.1 KB) - added by Saverio Trioni <strioni@…> 5 years ago.

Download all attachments as: .zip

Change History (5)

Changed 5 years ago by Saverio Trioni <strioni@…>

Changed 5 years ago by Saverio Trioni <strioni@…>

comment:1 Changed 4 years ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I'm going to mark this wontfix. You're correct that results will be returned more than once, but that's a leaky abstraction born of the relational nature of SQL. There are many places and queries where you can get duplicate results -- all you need to do is traverse more than one join. Although your dataset evidently doesn't reveal this, each of your standalone queries would be prone to duplicate results, under the right conditions.

comment:2 follow-up: Changed 4 years ago by Saverio Trioni <strioni@…>

Ok, you are right. I always try to read fully the documentation, but a simple search today showed me this:

http://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct

... (Could you mark this as RTFM???)

comment:3 in reply to: ↑ 2 Changed 4 years ago by kmtracey

  • Resolution set to invalid
  • Status changed from new to closed

Replying to Saverio Trioni <strioni@flumotion.com>:

... (Could you mark this as RTFM???)


Invalid is as close as we have to that.

Note: See TracTickets for help on using tickets.
Back to Top