Code

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#15823 closed Bug (fixed)

incorrect join condition when combining Q objects

Reported by: dcwatson Owned by: lukeplant
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: query, q, join
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX:

Description

Originally described the issue in https://groups.google.com/d/msg/django-developers/4wMNF61oQNM/hqspS-Jp5rwJ but copied here for reference.

Models:

class Item (models.Model):
    title = models.CharField(max_length=100)

class PropertyValue (models.Model):
    label = models.CharField(max_length=100)

class Property (models.Model):
    item = models.ForeignKey(Item, related_name='props')
    key = models.CharField(max_length=100)
    value = models.ForeignKey(PropertyValue, null=True)

Code:

item = Item.objects.create(title='Some Item')
pv = PropertyValue.objects.create(label='Some Value')
item.props.create(key='a', value=pv)
item.props.create(key='b')
q1 = Q(props__key='a', props__value=pv)
q2 = Q(props__key='b', props__value__isnull=True)
qs1 = Item.objects.filter(q1) & Item.objects.filter(q2)
qs2 = Item.objects.filter(q2) & Item.objects.filter(q1)

The problem is that qs1 and qs2 do not evaluate to the same thing, as they should (qs1 is empty). The SQL generated for qs1 is:

SELECT "app_item"."id", "app_item"."title" FROM "app_item" 
INNER JOIN "app_property" ON ("app_item"."id" = "app_property"."item_id") 
LEFT OUTER JOIN "app_property" T4 ON ("app_item"."id" = T4."item_id") 
LEFT OUTER JOIN "app_propertyvalue" T5 ON ("app_property"."value_id" = T5."id") 
WHERE (("app_property"."value_id" = 1  AND "app_property"."key" = 'a' ) AND (T5."id" IS NULL AND T4."key" = 'b'))

The problem is that the first app_property join corresponds to q1, and the second corresponds to q2. However, the app_propertyvalue join (corresponding to the isnull from q2) refers to app_property.value_id (i.e. from q1) instead of T4.value_id (i.e. from q2).

The attached patch checks the left side of the join when combining queries, and if it has already been re-aliased, uses the new alias. That way, join conditions should reference what they referenced before the combination. It also has a regression test, and all tests pass.

Attachments (1)

15823.diff (3.0 KB) - added by dcwatson 3 years ago.

Download all attachments as: .zip

Change History (10)

Changed 3 years ago by dcwatson

comment:1 Changed 3 years ago by dcwatson

  • milestone changed from 1.3 to 1.4
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 3 years ago by mir

  • Easy pickings unset
  • Owner changed from dcwatson to mir

comment:3 Changed 3 years ago by mir

  • Owner mir deleted
  • Triage Stage changed from Unreviewed to Ready for checkin

Someone with a deep knowledge of the ORM should review this. It works perfect (tested with sqlite), but I'm not in a position to understand possible side-effects.

@dcwatson: Thanks for a perfect bug report, you're making triage a joy ;-)

comment:4 Changed 3 years ago by lukeplant

Here's a small optimisation to the patch. This:

    if lhs in change_map:
        lhs = change_map[lhs]

could be replaced with:

    lhs = change_map.get(lhs, lhs)

The latter only needs to do one dictionary lookup.

I'm a little out of my depth in analysing this patch further than that, because I really haven't got familiar with this layer of Django's ORM since queryset-refactor was merged (yes, I know, shame on me), and it's quite a big class to try to understand (about 1800 lines!). However, it looks correct based on what is going on locally in that method.

comment:5 Changed 3 years ago by Alex

It appears correct to me, if the entirety of the test suite passes I'm comfortable with this patch.

comment:6 Changed 3 years ago by dcwatson

@lukeplant: Actually, my very crude tests show the if/in version edges out the get version. I don't know for sure, but I'm guessing either get has a try/except block that sucks a little time, or __contains__ is optimized, or both.

Get version:

import time
d = {'key': 'value'}
start = time.clock()
for k in xrange(100000):
    key = 'key' if k % 2 else 'asdf'
    val = d.get(key, 'default')
print time.clock() - start

If/in version:

import time
d = {'key': 'value'}
start = time.clock()
for k in xrange(100000):
    key = 'key' if k % 2 else 'asdf'
    if key in d:
        val = d[key]
print time.clock() - start

comment:7 Changed 3 years ago by lukeplant

Hmm, it's very strange, since doing 'key in d' has to do a dictionary lookup, and 'd[key]' also has to do one, meaning 1 or 2 lookups, depending on whether it is present or not, whereas the 'd.get' method will always do exactly 1, and you'd expect them to use similar code paths (they both have to check to see if the item is in the dictionary, without throwing exceptions, and you'd have thought that once you've found an item, it's not much work to return it). But I'm getting the same result as you - with CPython at least. (I tweaked your scripts a bit, because they don't do quite the same with regards to the string 'default').

However, pypy gives the order I'd expect, with a slightly more significant margin. So I guess it might be an optimisation that CPython makes.

But either way, it's not worth sweating.

comment:8 Changed 3 years ago by lukeplant

  • Owner set to lukeplant
  • Resolution set to fixed
  • Status changed from new to closed

In [16159]:

Fixed #15823 - incorrect join condition when combining Q objects

Thanks to dcwatson for the excellent report and patch.

comment:9 Changed 3 years ago by jacob

  • milestone 1.4 deleted

Milestone 1.4 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.