Opened 14 years ago

Closed 14 years ago

Last modified 13 years ago

#15823 closed Bug (fixed)

incorrect join condition when combining Q objects

Reported by: Dan Watson Owned by: Luke Plant
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: query, q, join
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Originally described the issue in https://groups.google.com/d/msg/django-developers/4wMNF61oQNM/hqspS-Jp5rwJ but copied here for reference.

Models:

class Item (models.Model):
    title = models.CharField(max_length=100)

class PropertyValue (models.Model):
    label = models.CharField(max_length=100)

class Property (models.Model):
    item = models.ForeignKey(Item, related_name='props')
    key = models.CharField(max_length=100)
    value = models.ForeignKey(PropertyValue, null=True)

Code:

item = Item.objects.create(title='Some Item')
pv = PropertyValue.objects.create(label='Some Value')
item.props.create(key='a', value=pv)
item.props.create(key='b')
q1 = Q(props__key='a', props__value=pv)
q2 = Q(props__key='b', props__value__isnull=True)
qs1 = Item.objects.filter(q1) & Item.objects.filter(q2)
qs2 = Item.objects.filter(q2) & Item.objects.filter(q1)

The problem is that qs1 and qs2 do not evaluate to the same thing, as they should (qs1 is empty). The SQL generated for qs1 is:

SELECT "app_item"."id", "app_item"."title" FROM "app_item" 
INNER JOIN "app_property" ON ("app_item"."id" = "app_property"."item_id") 
LEFT OUTER JOIN "app_property" T4 ON ("app_item"."id" = T4."item_id") 
LEFT OUTER JOIN "app_propertyvalue" T5 ON ("app_property"."value_id" = T5."id") 
WHERE (("app_property"."value_id" = 1  AND "app_property"."key" = 'a' ) AND (T5."id" IS NULL AND T4."key" = 'b'))

The problem is that the first app_property join corresponds to q1, and the second corresponds to q2. However, the app_propertyvalue join (corresponding to the isnull from q2) refers to app_property.value_id (i.e. from q1) instead of T4.value_id (i.e. from q2).

The attached patch checks the left side of the join when combining queries, and if it has already been re-aliased, uses the new alias. That way, join conditions should reference what they referenced before the combination. It also has a regression test, and all tests pass.

Attachments (1)

15823.diff (3.0 KB ) - added by Dan Watson 14 years ago.

Download all attachments as: .zip

Change History (10)

by Dan Watson, 14 years ago

Attachment: 15823.diff added

comment:1 by Dan Watson, 14 years ago

milestone: 1.31.4

comment:2 by Michael Radziej, 14 years ago

Easy pickings: unset
Owner: changed from Dan Watson to Michael Radziej

comment:3 by Michael Radziej, 14 years ago

Owner: Michael Radziej removed
Triage Stage: UnreviewedReady for checkin

Someone with a deep knowledge of the ORM should review this. It works perfect (tested with sqlite), but I'm not in a position to understand possible side-effects.

@dcwatson: Thanks for a perfect bug report, you're making triage a joy ;-)

comment:4 by Luke Plant, 14 years ago

Here's a small optimisation to the patch. This:

    if lhs in change_map:
        lhs = change_map[lhs]

could be replaced with:

    lhs = change_map.get(lhs, lhs)

The latter only needs to do one dictionary lookup.

I'm a little out of my depth in analysing this patch further than that, because I really haven't got familiar with this layer of Django's ORM since queryset-refactor was merged (yes, I know, shame on me), and it's quite a big class to try to understand (about 1800 lines!). However, it looks correct based on what is going on locally in that method.

comment:5 by Alex Gaynor, 14 years ago

It appears correct to me, if the entirety of the test suite passes I'm comfortable with this patch.

comment:6 by Dan Watson, 14 years ago

@lukeplant: Actually, my very crude tests show the if/in version edges out the get version. I don't know for sure, but I'm guessing either get has a try/except block that sucks a little time, or __contains__ is optimized, or both.

Get version:

import time
d = {'key': 'value'}
start = time.clock()
for k in xrange(100000):
    key = 'key' if k % 2 else 'asdf'
    val = d.get(key, 'default')
print time.clock() - start

If/in version:

import time
d = {'key': 'value'}
start = time.clock()
for k in xrange(100000):
    key = 'key' if k % 2 else 'asdf'
    if key in d:
        val = d[key]
print time.clock() - start

comment:7 by Luke Plant, 14 years ago

Hmm, it's very strange, since doing 'key in d' has to do a dictionary lookup, and 'd[key]' also has to do one, meaning 1 or 2 lookups, depending on whether it is present or not, whereas the 'd.get' method will always do exactly 1, and you'd expect them to use similar code paths (they both have to check to see if the item is in the dictionary, without throwing exceptions, and you'd have thought that once you've found an item, it's not much work to return it). But I'm getting the same result as you - with CPython at least. (I tweaked your scripts a bit, because they don't do quite the same with regards to the string 'default').

However, pypy gives the order I'd expect, with a slightly more significant margin. So I guess it might be an optimisation that CPython makes.

But either way, it's not worth sweating.

comment:8 by Luke Plant, 14 years ago

Owner: set to Luke Plant
Resolution: fixed
Status: newclosed

In [16159]:

Fixed #15823 - incorrect join condition when combining Q objects

Thanks to dcwatson for the excellent report and patch.

comment:9 by Jacob, 13 years ago

milestone: 1.4

Milestone 1.4 deleted

Note: See TracTickets for help on using tickets.
Back to Top