Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#15200 closed (wontfix)

ManyToManyField combined with "|" leads to trouble

Reported by: Klaas van Schelven Owned by: nobody
Component: Uncategorized Version: 1.2
Severity: Keywords:
Cc: Klaas van Schelven Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

ManyToManyField combined with "|" leads to trouble.

Specifically, the outer join that appears to be constructed for manytomany field lookups is not well cleaned up with where statements once an "or" ("|") is used.

$ cat theapp/models.py
from django.db import models

class AnObject(models.Model):
    nr = models.IntegerField()

class HasRef(models.Model):
    obj = models.ManyToManyField(AnObject)
    field = models.IntegerField()
>>> from theapp.models import *
>>> one = AnObject.objects.create(nr=1)
>>> two = AnObject.objects.create(nr=2)
>>> reffer = HasRef.objects.create(field=1)
>>> reffer.obj.add(one)
>>> reffer.obj.add(two)
>>> HasRef.objects.all().filter(field=1)
[<HasRef: HasRef object>]
>>> HasRef.objects.all().filter(obj=one)
[<HasRef: HasRef object>]
>>> HasRef.objects.all().filter(models.Q(field=1) | models.Q(obj=one))
[<HasRef: HasRef object>, <HasRef: HasRef object>]
>>> [o.pk for o in HasRef.objects.all().filter(models.Q(field=1) | models.Q(obj=one))]
[1L, 1L]

I don't have the time right now to come up with tests proper, or a patch. The problem is real though...

My workaround:

>>> HasRef.objects.all().filter(models.Q(field=1) | models.Q(obj=one)).annotate(models.Count("id"))

The generated query (without workaround)

>>> print HasRef.objects.all().filter(models.Q(field=1) | models.Q(obj=AnObject.objects.get(nr=1))).query
SELECT `theapp_hasref`.`id`, `theapp_hasref`.`field` FROM `theapp_hasref` LEFT OUTER JOIN `theapp_hasref_obj` ON (`theapp_hasref`.`id` = `theapp_hasref_obj`.`hasref_id`) WHERE (`theapp_hasref`.`field` = 1  OR `theapp_hasref_obj`.`anobject_id` = 1 )

Change History (5)

comment:1 by Klaas van Schelven, 13 years ago

Cc: Klaas van Schelven added

comment:2 by anonymous, 13 years ago

Resolution: invalid
Status: newclosed

Are you referring to the fact that there is a duplicate in the results? If this is the case, then this is a case of abstraction leakage, overcome with the use of distinct()

comment:3 by Klaas van Schelven, 13 years ago

Resolution: invalid
Status: closedreopened

Yes I'm indeed referring to the fact that there's a duplicate in the results. IMHO "abstraction leakage" is a bug and though it may be overcome with distinct() I think this is a little more fundamental.

comment:4 by Russell Keith-Magee, 13 years ago

Resolution: wontfix
Status: reopenedclosed

Yes, but the issue is that it's a *documented* leaky abstraction, and one that is very difficult to avoid without imposing a large computational overhead -- which is why the leaky abstraction exists.

Trust me -- we've thought about this. That's why it's a documented leaky abstraction. If you want to advocate for a different, less leaky interpretation, feel free to raise the issue on django dev. But be warned: you're not going to get much attention unless you come armed with an actual solution in the form of working code, or, at the very least, a general SQL solution to the problem.

comment:5 by Klaas van Schelven, 13 years ago

Hmmm... I guess you're talking about the following snippet:

By default, a QuerySet will not eliminate duplicate rows. In practice, this is rarely a problem, because simple queries such as Blog.objects.all() don't introduce the possibility of duplicate result rows. However, if your query spans multiple tables, it's possible to get duplicate results when a QuerySet is evaluated. That's when you'd use distinct().

IMHO this kind of leakiness is the worst kind (it happens only very rarely, and may happen intermittently depending on your data). Anyway I'm letting it rest for now.

Note: See TracTickets for help on using tickets.
Back to Top