#15823 closed Bug (fixed)
incorrect join condition when combining Q objects
Reported by: | Dan Watson | Owned by: | Luke Plant |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | query, q, join |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Originally described the issue in https://groups.google.com/d/msg/django-developers/4wMNF61oQNM/hqspS-Jp5rwJ but copied here for reference.
Models:
class Item (models.Model): title = models.CharField(max_length=100) class PropertyValue (models.Model): label = models.CharField(max_length=100) class Property (models.Model): item = models.ForeignKey(Item, related_name='props') key = models.CharField(max_length=100) value = models.ForeignKey(PropertyValue, null=True)
Code:
item = Item.objects.create(title='Some Item') pv = PropertyValue.objects.create(label='Some Value') item.props.create(key='a', value=pv) item.props.create(key='b') q1 = Q(props__key='a', props__value=pv) q2 = Q(props__key='b', props__value__isnull=True) qs1 = Item.objects.filter(q1) & Item.objects.filter(q2) qs2 = Item.objects.filter(q2) & Item.objects.filter(q1)
The problem is that qs1
and qs2
do not evaluate to the same thing, as they should (qs1
is empty). The SQL generated for qs1
is:
SELECT "app_item"."id", "app_item"."title" FROM "app_item" INNER JOIN "app_property" ON ("app_item"."id" = "app_property"."item_id") LEFT OUTER JOIN "app_property" T4 ON ("app_item"."id" = T4."item_id") LEFT OUTER JOIN "app_propertyvalue" T5 ON ("app_property"."value_id" = T5."id") WHERE (("app_property"."value_id" = 1 AND "app_property"."key" = 'a' ) AND (T5."id" IS NULL AND T4."key" = 'b'))
The problem is that the first app_property
join corresponds to q1
, and the second corresponds to q2
. However, the app_propertyvalue
join (corresponding to the isnull
from q2
) refers to app_property.value_id
(i.e. from q1
) instead of T4.value_id
(i.e. from q2
).
The attached patch checks the left side of the join when combining queries, and if it has already been re-aliased, uses the new alias. That way, join conditions should reference what they referenced before the combination. It also has a regression test, and all tests pass.
Attachments (1)
Change History (10)
by , 14 years ago
Attachment: | 15823.diff added |
---|
comment:1 by , 14 years ago
milestone: | 1.3 → 1.4 |
---|
comment:2 by , 14 years ago
Easy pickings: | unset |
---|---|
Owner: | changed from | to
comment:3 by , 14 years ago
Owner: | removed |
---|---|
Triage Stage: | Unreviewed → Ready for checkin |
comment:4 by , 14 years ago
Here's a small optimisation to the patch. This:
if lhs in change_map: lhs = change_map[lhs]
could be replaced with:
lhs = change_map.get(lhs, lhs)
The latter only needs to do one dictionary lookup.
I'm a little out of my depth in analysing this patch further than that, because I really haven't got familiar with this layer of Django's ORM since queryset-refactor was merged (yes, I know, shame on me), and it's quite a big class to try to understand (about 1800 lines!). However, it looks correct based on what is going on locally in that method.
comment:5 by , 14 years ago
It appears correct to me, if the entirety of the test suite passes I'm comfortable with this patch.
comment:6 by , 14 years ago
@lukeplant: Actually, my very crude tests show the if/in version edges out the get version. I don't know for sure, but I'm guessing either get
has a try/except block that sucks a little time, or __contains__
is optimized, or both.
Get version:
import time d = {'key': 'value'} start = time.clock() for k in xrange(100000): key = 'key' if k % 2 else 'asdf' val = d.get(key, 'default') print time.clock() - start
If/in version:
import time d = {'key': 'value'} start = time.clock() for k in xrange(100000): key = 'key' if k % 2 else 'asdf' if key in d: val = d[key] print time.clock() - start
comment:7 by , 14 years ago
Hmm, it's very strange, since doing 'key in d' has to do a dictionary lookup, and 'd[key]' also has to do one, meaning 1 or 2 lookups, depending on whether it is present or not, whereas the 'd.get' method will always do exactly 1, and you'd expect them to use similar code paths (they both have to check to see if the item is in the dictionary, without throwing exceptions, and you'd have thought that once you've found an item, it's not much work to return it). But I'm getting the same result as you - with CPython at least. (I tweaked your scripts a bit, because they don't do quite the same with regards to the string 'default').
However, pypy gives the order I'd expect, with a slightly more significant margin. So I guess it might be an optimisation that CPython makes.
But either way, it's not worth sweating.
Someone with a deep knowledge of the ORM should review this. It works perfect (tested with sqlite), but I'm not in a position to understand possible side-effects.
@dcwatson: Thanks for a perfect bug report, you're making triage a joy ;-)