Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#15823 closed Bug (fixed)

incorrect join condition when combining Q objects

Reported by: Dan Watson Owned by: Luke Plant
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: query, q, join
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


Originally described the issue in but copied here for reference.


class Item (models.Model):
    title = models.CharField(max_length=100)

class PropertyValue (models.Model):
    label = models.CharField(max_length=100)

class Property (models.Model):
    item = models.ForeignKey(Item, related_name='props')
    key = models.CharField(max_length=100)
    value = models.ForeignKey(PropertyValue, null=True)


item = Item.objects.create(title='Some Item')
pv = PropertyValue.objects.create(label='Some Value')
item.props.create(key='a', value=pv)
q1 = Q(props__key='a', props__value=pv)
q2 = Q(props__key='b', props__value__isnull=True)
qs1 = Item.objects.filter(q1) & Item.objects.filter(q2)
qs2 = Item.objects.filter(q2) & Item.objects.filter(q1)

The problem is that qs1 and qs2 do not evaluate to the same thing, as they should (qs1 is empty). The SQL generated for qs1 is:

SELECT "app_item"."id", "app_item"."title" FROM "app_item" 
INNER JOIN "app_property" ON ("app_item"."id" = "app_property"."item_id") 
LEFT OUTER JOIN "app_property" T4 ON ("app_item"."id" = T4."item_id") 
LEFT OUTER JOIN "app_propertyvalue" T5 ON ("app_property"."value_id" = T5."id") 
WHERE (("app_property"."value_id" = 1  AND "app_property"."key" = 'a' ) AND (T5."id" IS NULL AND T4."key" = 'b'))

The problem is that the first app_property join corresponds to q1, and the second corresponds to q2. However, the app_propertyvalue join (corresponding to the isnull from q2) refers to app_property.value_id (i.e. from q1) instead of T4.value_id (i.e. from q2).

The attached patch checks the left side of the join when combining queries, and if it has already been re-aliased, uses the new alias. That way, join conditions should reference what they referenced before the combination. It also has a regression test, and all tests pass.

Attachments (1)

15823.diff (3.0 KB) - added by Dan Watson 8 years ago.

Download all attachments as: .zip

Change History (10)

Changed 8 years ago by Dan Watson

Attachment: 15823.diff added

comment:1 Changed 8 years ago by Dan Watson

milestone: 1.31.4

comment:2 Changed 8 years ago by Michael Radziej

Easy pickings: unset
Owner: changed from Dan Watson to Michael Radziej

comment:3 Changed 8 years ago by Michael Radziej

Owner: Michael Radziej deleted
Triage Stage: UnreviewedReady for checkin

Someone with a deep knowledge of the ORM should review this. It works perfect (tested with sqlite), but I'm not in a position to understand possible side-effects.

@dcwatson: Thanks for a perfect bug report, you're making triage a joy ;-)

comment:4 Changed 8 years ago by Luke Plant

Here's a small optimisation to the patch. This:

    if lhs in change_map:
        lhs = change_map[lhs]

could be replaced with:

    lhs = change_map.get(lhs, lhs)

The latter only needs to do one dictionary lookup.

I'm a little out of my depth in analysing this patch further than that, because I really haven't got familiar with this layer of Django's ORM since queryset-refactor was merged (yes, I know, shame on me), and it's quite a big class to try to understand (about 1800 lines!). However, it looks correct based on what is going on locally in that method.

comment:5 Changed 8 years ago by Alex Gaynor

It appears correct to me, if the entirety of the test suite passes I'm comfortable with this patch.

comment:6 Changed 8 years ago by Dan Watson

@lukeplant: Actually, my very crude tests show the if/in version edges out the get version. I don't know for sure, but I'm guessing either get has a try/except block that sucks a little time, or __contains__ is optimized, or both.

Get version:

import time
d = {'key': 'value'}
start = time.clock()
for k in xrange(100000):
    key = 'key' if k % 2 else 'asdf'
    val = d.get(key, 'default')
print time.clock() - start

If/in version:

import time
d = {'key': 'value'}
start = time.clock()
for k in xrange(100000):
    key = 'key' if k % 2 else 'asdf'
    if key in d:
        val = d[key]
print time.clock() - start

comment:7 Changed 8 years ago by Luke Plant

Hmm, it's very strange, since doing 'key in d' has to do a dictionary lookup, and 'd[key]' also has to do one, meaning 1 or 2 lookups, depending on whether it is present or not, whereas the 'd.get' method will always do exactly 1, and you'd expect them to use similar code paths (they both have to check to see if the item is in the dictionary, without throwing exceptions, and you'd have thought that once you've found an item, it's not much work to return it). But I'm getting the same result as you - with CPython at least. (I tweaked your scripts a bit, because they don't do quite the same with regards to the string 'default').

However, pypy gives the order I'd expect, with a slightly more significant margin. So I guess it might be an optimisation that CPython makes.

But either way, it's not worth sweating.

comment:8 Changed 8 years ago by Luke Plant

Owner: set to Luke Plant
Resolution: fixed
Status: newclosed

In [16159]:

Fixed #15823 - incorrect join condition when combining Q objects

Thanks to dcwatson for the excellent report and patch.

comment:9 Changed 7 years ago by Jacob

milestone: 1.4

Milestone 1.4 deleted

Note: See TracTickets for help on using tickets.
Back to Top