Opened 12 years ago

Closed 11 years ago

#17459 closed Bug (wontfix)

Subquery fails across multiple databases

Reported by: David Gouldin Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: dgouldin@…, lrekucki@… Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Given the following databases:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': 'db1',
    },
    'db2': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': 'db2',
    },
}

database router:

class FooBarRouter(object):
    def db_for_read(self, model, **hints):
        if model._meta.app_label == 'bar':
            return 'db2'
        return None

    def db_for_write(self, model, **hints):
        if model._meta.app_label == 'bar':
            return 'db2'
        return None

    def allow_syncdb(self, db, model):
        if model._meta.app_label == 'bar':
            return db == 'db2'
        else:
            return db == 'default'

and models:

'''foo/models.py'''
class FooModel(models.Model):
    name = models.CharField(max_length=100)
'''bar/models.py'''
class BarModel(models.Model):
    name = models.CharField(max_length=100)
    foo_id = models.IntegerField()

The following will produce "DatabaseError: no such table: foo_foomodel":

from foo.models import FooModel
from bar.models import BarModel

foo1 = FooModel.objects.create(name='foo1')
bar1 = BarModel.objects.create(name='bar1', foo_id=foo1.id)

BarModel.objects.filter(foo_id__in=FooModel.objects.values_list('id', flat=True))

Instead of trying to merge the 2 querysets into a single sql statement, the inner queryset should be found to use a different database and be evaluated separately from the outer one.

Change History (5)

comment:1 by Łukasz Rekucki, 12 years ago

Resolution: wontfix
Status: newclosed

After thinking about this for a while, I think it's not something Django should do implicitly.

First of all, because this results in some serious performance drawback that's hidden in a place you normally wouldn't expect it to be and it gives a false impression of Django supporting cross-database queries.

Second, because it's not always the right thing to do. Taking the (naive) example of master-slave configuration from the docs, the fact that the router returns different slaves for Bar and Foo, shouldn't prevent Django from generating the subquery. As Django has no other way of knowing, which dbs contain which tables, we can't assume that the table is or isn't there.

Explicitly using list() fixes the problem and is more explicit about performance/memory overhead of the query.

Last edited 12 years ago by Łukasz Rekucki (previous) (diff)

comment:2 by Jeremy Dunck, 12 years ago

Resolution: wontfix
Status: closedreopened

Well, even if the table did exist, you couldn't assume that the result would be correct in a multi-db context because it's pretty often the case that table A on db-0 has a subset and table B on db-1 has a different subset.

I think the fact that a multi-db query could successfully run quietly on a subset of the intended data is the worse bug here. I agree that list'ing the subquery is a fair solution, but first you have to realize that you have the problem at all.

The problem is that the inlined queryset doesn't call db_for_read at all, so that the router can't advise the ORM not to use the default/parent-qs's DB.

I agree that inlining is useful, even in a multi-db context, but the child qs's router.db_for_read should still get a change to advise and not be inlined.

Reopening - please wontfix again if you disagree.

Version 0, edited 12 years ago by Jeremy Dunck (next)

comment:3 by Łukasz Rekucki, 12 years ago

Cc: lrekucki@… added
Triage Stage: UnreviewedDesign decision needed

I'm not sure calling db_for_read(Foo) will help here even if we include a hint like subquery_of=Bar. If parent query is already bound to some db, the only thing the router can do for subquery is either return the same DB (thus allowing the subquery to perform) or throw an exception because it's impossible to perform the query (i'm still -1 on doing list() implicitly).

How about adding a new method to the router allow_subquery (which would basically work the same as allow_relation, but for models instead of instances) ?

comment:4 by Aymeric Augustin, 11 years ago

Status: reopenednew

comment:5 by Jacob, 11 years ago

Resolution: wontfix
Status: newclosed

Yeah I agree with lrekucki's initial analysis: doing some magic cross-database join emulation isn't something I want Django to do. It's possible the error message could be improved -- feel free to open a new ticket for that -- but I don't want this sort of silent join emulation happening.

Note: See TracTickets for help on using tickets.
Back to Top