Code

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#15200 closed (wontfix)

ManyToManyField combined with "|" leads to trouble

Reported by: vanschelven Owned by: nobody
Component: Uncategorized Version: 1.2
Severity: Keywords:
Cc: vanschelven Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

ManyToManyField combined with "|" leads to trouble.

Specifically, the outer join that appears to be constructed for manytomany field lookups is not well cleaned up with where statements once an "or" ("|") is used.

$ cat theapp/models.py
from django.db import models

class AnObject(models.Model):
    nr = models.IntegerField()

class HasRef(models.Model):
    obj = models.ManyToManyField(AnObject)
    field = models.IntegerField()
>>> from theapp.models import *
>>> one = AnObject.objects.create(nr=1)
>>> two = AnObject.objects.create(nr=2)
>>> reffer = HasRef.objects.create(field=1)
>>> reffer.obj.add(one)
>>> reffer.obj.add(two)
>>> HasRef.objects.all().filter(field=1)
[<HasRef: HasRef object>]
>>> HasRef.objects.all().filter(obj=one)
[<HasRef: HasRef object>]
>>> HasRef.objects.all().filter(models.Q(field=1) | models.Q(obj=one))
[<HasRef: HasRef object>, <HasRef: HasRef object>]
>>> [o.pk for o in HasRef.objects.all().filter(models.Q(field=1) | models.Q(obj=one))]
[1L, 1L]

I don't have the time right now to come up with tests proper, or a patch. The problem is real though...

My workaround:

>>> HasRef.objects.all().filter(models.Q(field=1) | models.Q(obj=one)).annotate(models.Count("id"))

The generated query (without workaround)

>>> print HasRef.objects.all().filter(models.Q(field=1) | models.Q(obj=AnObject.objects.get(nr=1))).query
SELECT `theapp_hasref`.`id`, `theapp_hasref`.`field` FROM `theapp_hasref` LEFT OUTER JOIN `theapp_hasref_obj` ON (`theapp_hasref`.`id` = `theapp_hasref_obj`.`hasref_id`) WHERE (`theapp_hasref`.`field` = 1  OR `theapp_hasref_obj`.`anobject_id` = 1 )

Attachments (0)

Change History (5)

comment:1 Changed 3 years ago by vanschelven

  • Cc vanschelven added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 3 years ago by anonymous

  • Resolution set to invalid
  • Status changed from new to closed

Are you referring to the fact that there is a duplicate in the results? If this is the case, then this is a case of abstraction leakage, overcome with the use of distinct()

comment:3 Changed 3 years ago by vanschelven

  • Resolution invalid deleted
  • Status changed from closed to reopened

Yes I'm indeed referring to the fact that there's a duplicate in the results. IMHO "abstraction leakage" is a bug and though it may be overcome with distinct() I think this is a little more fundamental.

comment:4 Changed 3 years ago by russellm

  • Resolution set to wontfix
  • Status changed from reopened to closed

Yes, but the issue is that it's a *documented* leaky abstraction, and one that is very difficult to avoid without imposing a large computational overhead -- which is why the leaky abstraction exists.

Trust me -- we've thought about this. That's why it's a documented leaky abstraction. If you want to advocate for a different, less leaky interpretation, feel free to raise the issue on django dev. But be warned: you're not going to get much attention unless you come armed with an actual solution in the form of working code, or, at the very least, a general SQL solution to the problem.

comment:5 Changed 3 years ago by vanschelven

Hmmm... I guess you're talking about the following snippet:

By default, a QuerySet will not eliminate duplicate rows. In practice, this is rarely a problem, because simple queries such as Blog.objects.all() don't introduce the possibility of duplicate result rows. However, if your query spans multiple tables, it's possible to get duplicate results when a QuerySet is evaluated. That's when you'd use distinct().

IMHO this kind of leakiness is the worst kind (it happens only very rarely, and may happen intermittently depending on your data). Anyway I'm letting it rest for now.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.