Opened 13 years ago
Closed 12 years ago
#17459 closed Bug (wontfix)
Subquery fails across multiple databases
Reported by: | David Gouldin | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | dgouldin@…, lrekucki@… | Triage Stage: | Design decision needed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Given the following databases:
DATABASES = { 'default': { 'ENGINE': 'django.db.backends.sqlite3', 'NAME': 'db1', }, 'db2': { 'ENGINE': 'django.db.backends.sqlite3', 'NAME': 'db2', }, }
database router:
class FooBarRouter(object): def db_for_read(self, model, **hints): if model._meta.app_label == 'bar': return 'db2' return None def db_for_write(self, model, **hints): if model._meta.app_label == 'bar': return 'db2' return None def allow_syncdb(self, db, model): if model._meta.app_label == 'bar': return db == 'db2' else: return db == 'default'
and models:
'''foo/models.py''' class FooModel(models.Model): name = models.CharField(max_length=100)
'''bar/models.py''' class BarModel(models.Model): name = models.CharField(max_length=100) foo_id = models.IntegerField()
The following will produce "DatabaseError: no such table: foo_foomodel":
from foo.models import FooModel from bar.models import BarModel foo1 = FooModel.objects.create(name='foo1') bar1 = BarModel.objects.create(name='bar1', foo_id=foo1.id) BarModel.objects.filter(foo_id__in=FooModel.objects.values_list('id', flat=True))
Instead of trying to merge the 2 querysets into a single sql statement, the inner queryset should be found to use a different database and be evaluated separately from the outer one.
Change History (5)
comment:1 by , 13 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:2 by , 13 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
Well, even if the table did exist, you couldn't assume that the result would be correct in a multi-db context because it's pretty often the case that table A on db-0 has a subset and table B on db-1 has a different subset.
I think the fact that a multi-db query could successfully run quietly on a subset of the intended data is the worse bug here. I agree that list'ing the subquery is a fair solution, but first you have to realize that you have the problem at all.
The problem is that the inlined queryset doesn't call db_for_read at all, so that the router can't advise the ORM not to use the default/parent-qs's DB.
I agree that inlining is useful, even in a multi-db context, but the child qs's router.db_for_read should still get a chance to advise and not be inlined.
Reopening - please wontfix again if you disagree.
comment:3 by , 13 years ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Design decision needed |
I'm not sure calling db_for_read(Foo)
will help here even if we include a hint like subquery_of=Bar
. If parent query is already bound to some db, the only thing the router can do for subquery is either return the same DB (thus allowing the subquery to perform) or throw an exception because it's impossible to perform the query (i'm still -1 on doing list() implicitly).
How about adding a new method to the router allow_subquery
(which would basically work the same as allow_relation
, but for models instead of instances) ?
comment:4 by , 12 years ago
Status: | reopened → new |
---|
comment:5 by , 12 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Yeah I agree with lrekucki's initial analysis: doing some magic cross-database join emulation isn't something I want Django to do. It's possible the error message could be improved -- feel free to open a new ticket for that -- but I don't want this sort of silent join emulation happening.
After thinking about this for a while, I think it's not something Django should do implicitly.
First of all, because this results in some serious performance drawback that's hidden in a place you normally wouldn't expect it to be and it gives a false impression of Django supporting cross-database queries.
Second, because it's not always the right thing to do. Taking the (naive) example of master-slave configuration from the docs, the fact that the router returns different slaves for Bar and Foo, shouldn't prevent Django from generating the subquery. As Django has no other way of knowing, which dbs contain which tables, we can't assume that the table is or isn't there.
Explicitly using
list()
fixes the problem and is more explicit about performance/memory overhead of the query.