Opened 18 years ago

Closed 14 years ago

Last modified 14 years ago

#1142 closed enhancement (fixed)

Support for multiple database connections

Reported by: Simon Willison Owned by: Alex Gaynor
Component: Database layer (models, ORM) Version:
Severity: normal Keywords: multiple database feature
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Django currently assumes that all models will be stored in a single database, and that only one database connection will be used for the duration of a request. This assumption does not scale to really large applications, where it is common for multiple database connections to be used in non-obvious ways.

Three examples include:

  1. Traditional replication, where all writes go to a single master database while reads are distributed across a number of slave databases.
  2. Sharding, where (for example) user accounts 1-1000 live on db1, 1001-2000 live on db2 etc.
  3. Different types of data live on different servers / clusters. Frequently accessed user profile data might be stored on a seperate database/cluster from log data which is frequently written but very rarely accessed.

At the very least, Django needs to allow more than one database connection to be maintained by the DB wrapper. The default should still be a single connection as this is the common case, but Django should not get in the way should multiple connections be desired.

Rather than having a single connection, how about maintaining a dictionary of connections? A "default" key could correspond to the connection that is used in most cases, but other connections can be configured in the settings file and assigned names. There would need to be a mechanism somewhere for Django model classes to be told which named connection they should use.

Simple replication may end up being a different issue entirely - it's possible that could be handled just with a custom DB backend that knows to send writes to one server and distribute reads across several others. The above change (where Django allows multiple DB connections) is still essential for more complex configurations.

More about this on the mailing list: http://groups.google.com/group/django-users/browse_thread/thread/d08f4975da831bb/75ede5e9c7f0db53

Attachments (1)

py23_set_fix_28feb07.diff (570 bytes) - added by afarnham@… 17 years ago.
Fixes mulitple-db-support branch django/db/models/manager.py for Python 2.3

Download all attachments as: .zip

Change History (81)

comment:1 Changed 18 years ago by ian@…

An Alternative to this would be to use something like CJDBC/Sequoia (https://forge.continuent.org/projects/sequoia/) and manage the clustering at a lower level to django itself.

the link is for a JDBC wrapper, but they also have a C++ library which might be useful. all that is needed is a python module ;-)

regards
Ian

comment:2 Changed 18 years ago by Simon Willison

My limited experience with really large apps suggests that handling clustering at a lower level won't provide enough flexibility. For huge sites you end up having to take different scaling approaches for different bits of functionality - you might have one web service call (or RSS feed) that is hit more than anything else and needs to be scaled in a different way for example. You end up needing to scale different parts of the application in different ways, often using different databases for different parts of the app.

There are also other use-cases for multiple database connections outside of scaling - talking to two legacy applications at once for example.

comment:3 Changed 18 years ago by Kozmo Kid

Django may benefit from the ability to define connections at the application level that can override the sitewide setting. This would make it rather trivial to pull data in from multiple platforms.

An interface to SQLrelay (http://sqlrelay.sourceforge.net/) might help the scalability issue.

comment:4 Changed 18 years ago by Morten Bagai

Simon,

Per your recommendation here I'm just adding a brief description of the scenario in which we typically encounter this current limitation of Django:

In a mixed-database environment, We're typically faced with having to model data types hosted on different database servers. For example, web content may generally live on our MySQL servers, but some applications will need to incorporate invoice data from a MS SQL database. While it's certainly a general mess to have data spread out over multiple engines like this, I think it's also a fact that many developers (especially in small-medium corporate environments) are faced with, and not having the ability to easily manage data from different source within a Django app is going to be a serious limitation to these people.

The solutions of either being able to define a dictionary of database connections in the sitewide config file or being able to specify database connections on an application level, could both work well. I wonder, though, what exactly would be entailed in defining connections on the application level? Would applications generally be able to add their own settings file in which sitewide preferences could be overwritten, or would there be a specific module, like myapp.db.connections for specifying app-level connections? Maybe the simplest and most backward-compatible approach would be to keep a dictionary of named connections available to installed apps in the sitewide settings file as originally suggested.

Thanks. /Morten

comment:5 Changed 17 years ago by Adrian Holovaty

priority: normallow
Severity: majornormal

comment:6 Changed 17 years ago by anonymous

I would have to agree that this is a serious limitation.
It is not always possible to restructure existing databases into a single database for use and it would be of great valu to be able to use the django database model to integrate multiple databases at some level.

comment:7 Changed 17 years ago by Go

Type: enhancement

comment:8 Changed 17 years ago by anonymous

Type: defect

comment:9 Changed 17 years ago by Jacob

(In [3198]) Created branch for MultipleDatabaseSupport. Refs #1142.

comment:10 Changed 17 years ago by anonymous

Cc: gary.wilson@… added

comment:11 Changed 17 years ago by Andy Dustman <farcepest@…>

Cc: farcepest@… added

comment:12 Changed 17 years ago by anonymous

Component: Database wrapperValidators
milestone: Version 0.91
priority: lowhigh
Severity: normalblocker
Type: defectenhancement
Version: new-admin

comment:13 Changed 17 years ago by anonymous

Component: Validatorsdjango-admin.py
milestone: Version 0.91
priority: highhighest
Severity: blockertrivial
Type: enhancementtask
Version: new-adminSVN

comment:14 Changed 17 years ago by hi-world cup

Cc: hi-world cup added; gary.wilson@… farcepest@… removed
Keywords: rthml tab space editor js added
Summary: Support for multiple database connectionshi-world cup

comment:15 Changed 17 years ago by Adrian Holovaty

Summary: hi-world cupSupport for multiple database connections

comment:16 Changed 17 years ago by anonymous

Cc: gary.wilson@… farcepest@… added; hi-world cup removed

comment:17 Changed 17 years ago by anonymous

What is current status of this? When can we expect to see this in trunk?

thanks,
Forest

comment:18 Changed 17 years ago by Adrian Holovaty

priority: highestnormal
Severity: trivialnormal

comment:19 Changed 17 years ago by anonymous

Cc: eikke@… added

comment:20 Changed 17 years ago by nmccullough@…

Has there been any progress on this feature? I am in a similar situation. My project needs to be able to query different databases for different pieces of data. Another suggestion would be to be able to over ride the connection settings at the app level.

comment:21 Changed 17 years ago by Gary Wilson <gary.wilson@…>

Component: django-admin.pyDatabase wrapper
Keywords: rthml tab space editor js removed
Type: taskenhancement

reverted metadata spam.

comment:22 Changed 17 years ago by Simon G. <dev@…>

Keywords: multiple database added
Triage Stage: UnreviewedDesign decision needed
Version: SVN

comment:23 Changed 17 years ago by namiswaan@…

I hope this gets merged into the trunk soon.

comment:24 Changed 17 years ago by dleewo@…

Is multiple database support going to implemented?

At least in my case, I was looking at Django for an application for my company, but it will need to access at least 2 databases. The main one, plus a database from another product over which we have no control and can't change.

We would actually like it to be 3 databases with the 3rd one being for reporting data. We prefer to keep reporting data in a separate database for performance purposes and for scalability, but if absolutely necessary, we would put it in the dsame database as the main data. That still leaves us with at least 2 databases.

We have another app that was done in Java with Hibernate and Hibernate is currently accessing 4 databases from within the same app.

comment:25 Changed 17 years ago by Gary Wilson <gary.wilson@…>

Has patch: set
Needs documentation: set
Patch needs improvement: set
Triage Stage: Design decision neededAccepted

According to MultipleDatabaseSupport, the branch is feature complete and seems to be only lacking documentation.

If anyone is using or has tested the multi db branch, please note your experiences here.

Marking patch needs improvement since the latest commit to the branch notes that tests are still failing.

comment:26 Changed 17 years ago by mmccarty@…

Question: Does the patch for this also provide a work-around for keeping the native Django apps tables out of your database?

comment:27 Changed 17 years ago by anonymous

Cc: peter.kese@… added

Changed 17 years ago by afarnham@…

Attachment: py23_set_fix_28feb07.diff added

Fixes mulitple-db-support branch django/db/models/manager.py for Python 2.3

comment:28 Changed 17 years ago by namiswaan@…

Has patch: unset

It's been a long time since something has happened to this branch.. will it ever get included in the main trunk?

I wish to use this, but i want to stay with the current Django code. Is it possible to check-out the trunk and merge the changes from this branch?

comment:29 Changed 17 years ago by namiswaan@…

Has patch: set

oops, reverting the hash patch flag i removed

comment:30 Changed 17 years ago by Malcolm Tredinnick

Please post question like comment 28 to the users list, not this ticket. Ticket comments are for resolving the issue at hand, not seeking genera. information.

comment:31 Changed 17 years ago by robertmlamarca@…

Hi there, I have been following this conversation. I am working with some friends on a research project that requires one, maybe several postgres dbs. It is a bit off-label, but were hoping to make use of some of the Django features.

Multiple Dbs is something long-term we might need. Can someone tell me how to install this patch? I am running latest Django code with Python 2.4

So far, I see some patch online with about four lines of code... My Python skills are somewhere around early intermediate.. so I am not sure if :

  1. if just placing that code in my latest subversion checkout (where the surrounding code looks a bit different than what is there).. constitutes an installation of this patch.

and

  1. how can I use it?

Appreciate your patience and assistance.

Robert

comment:32 Changed 16 years ago by anonymous

Cc: chris.mcavoy@… added

comment:33 Changed 16 years ago by (removed)

Cc: ferringb@… added

comment:34 Changed 16 years ago by Forte

Cc: lindemann@… added; gary.wilson@… farcepest@… eikke@… peter.kese@… chris.mcavoy@… ferringb@… removed
Component: Database wrapperUncategorized
Owner: changed from Adrian Holovaty to Jacob

comment:35 Changed 16 years ago by Malcolm Tredinnick

Cc: gary.wilson@… farcepest@… eikke@… peter.kese@… chris.mcavoy@… ferringb@… added; lindemann@… removed
Component: UncategorizedDatabase wrapper
Owner: changed from Jacob to Adrian Holovaty

Reverted spam.

comment:36 Changed 16 years ago by Philippe Raoult

Keywords: feature added

comment:37 Changed 16 years ago by Jacob

Triage Stage: Design decision neededSomeday/Maybe

comment:38 Changed 16 years ago by anonymous

Cc: iacobs@… added

comment:39 Changed 16 years ago by anonymous

Cc: andrewjwilkinson@… added

comment:40 Changed 16 years ago by anonymous

Cc: email@… added

comment:41 Changed 16 years ago by darylspitzer

Cc: daryl.spitzer@… added

comment:42 Changed 16 years ago by darylspitzer

Owner: changed from nobody to darylspitzer
Status: newassigned

I reviewed an API design with Jacob and Adrian (and others) and will now work on a prototype. Much of the credit for the API goes to Ben Ford (and his code should give me a big headstart).

comment:43 Changed 16 years ago by cmcavoy

I'm glad to see someone taking this ticket, it's something I've wanted for a long time. Any chance you can write up your planned API?

comment:44 in reply to:  43 Changed 16 years ago by darylspitzer

Replying to cmcavoy:

I'm glad to see someone taking this ticket, it's something I've wanted for a long time. Any chance you can write up your planned API?

I'll answer your question on django-developers.

comment:45 Changed 16 years ago by mzdaniel

Cc: remco@… mzdaniel@… added

comment:46 Changed 16 years ago by anonymous

Cc: rik24d@… added

comment:47 Changed 16 years ago by anonymous

Cc: stuartw@… added

comment:48 Changed 16 years ago by Joe Holloway <jholloway7+django@…>

Cc: jholloway7+django@… added

comment:49 Changed 15 years ago by simeon

Cc: simeonf+django@… added

comment:50 Changed 15 years ago by Tom Carrick <knyght@…>

Cc: knyght+django@… added

comment:51 Changed 15 years ago by anonymous

Cc: spoof@… added

comment:52 Changed 15 years ago by Koen Biermans <koen.biermans@…>

Ben Ford set up a mercurial repository for new work on multiple databases at http://hg.woe-beti.de. A track setup is available at http://trac.woe-beti.de.

comment:53 Changed 15 years ago by darylspitzer

See this django-developers thread for more discussion (including my API proposal): http://tinyurl.com/5q7oy5

comment:54 Changed 15 years ago by mrts

milestone: post-1.0

Not in scope for 1.0.

comment:55 Changed 15 years ago by paulforrester

Cc: django@… added

comment:56 Changed 15 years ago by flosch

Cc: flori@… added

comment:57 Changed 15 years ago by Joey Wilhelm

Cc: tarkatronic@… added

What is the current status on this issue? There doesn't appear to have been any visible movement in several months now, and this is something which would be very useful in a project I'm working on. I have a slightly different use case than the previously mentioned ones, however... and I'm sure it's not an entirely common one, but perhaps something to take into consideration.

The company I work for is sitting on a fairly large data warehouse for a number of clients. Each client has their own database, with a common schema, on one of two MS SQL 2k servers. In the near future, we will hopefully be moving some of these clients onto a PostgreSQL server (and in the distant future, it would be nice to move towards Greenplum; although that should be transparent with psycopg2, afaik). In addition, there is a separate database already running on the PostgreSQL server, which stores configuration parameters and which I am using for the Django system tables.

On every page load, I need Django to connect to database 'A' on the PostgreSQL server for any Django specific operations, then connect to an arbitrary database on either of the MS SQL servers, depending on which of our clients' data is currently being viewed. From what I've seen so far in the discussion in this ticket and on MultipleDatabaseSupport, this doesn't sound like it would likely be a supported scenario, as the connection to be used is determined dynamically, rather than a static connection chosen per model. In my case, all models apply across all database connections.

Now, I have made this work thus far with my own connection manager (http://dpaste.com/hold/84458/) but using it is somewhat kludgy and I'm sure it's far from a full implementation. I also had to apply a hack (http://code.google.com/p/django-pyodbc/issues/detail?id=18) to django-pyodbc and apply a pending patch (#6710) to Django in order to get it to work. Here is an example of why it in use: http://dpaste.com/hold/84465/

comment:58 Changed 15 years ago by Malcolm Tredinnick

There has, in fact, been lots of activity. Perhaps you might wish to peruse the archives of django-developers, which is where design work takes place. This ticket is really just a placeholder that will be closed when we commit some kind of final solution (and possibly for patches as we get closer if the work is self-contained enough not to require a branch).

comment:59 Changed 15 years ago by anonymous

Cc: alexandru.palade@… added

comment:60 Changed 15 years ago by anonymous

Cc: Jay Hargis added

comment:61 Changed 15 years ago by (none)

milestone: post-1.0

Milestone post-1.0 deleted

comment:62 Changed 15 years ago by lao

Cc: lao added

comment:63 Changed 15 years ago by lao

(idea #10492)

My look is

# Wrapper classes
from re import compile

class Db:
    def __init__(self, cstr):
        self.link = None
        __re_dbstr = compile(r'^(?P<engine>[^:]+)://((?P<user>[^\(:]+)(\((?P<role>.+)\))?(:(?P<password>[^@]+))?(@(?P<host>[^\/:]+)(:(?P<port>\d+))?)?)?/(?P<name>.+)')
        try:
            self.__db = __re_dbstr.search(cstr).groupdict()
            # Fix for sqlite
            if self.__db['engine'].startswith('sqlite'):
                self.__db['name'] = "/%s" % self.__db['name']
        except:
            self.__db = {}
            raise Exception("#1")
        self.connect()

    def connect(self):
        self.link = 'Connection link'   # ....

    def keys(self):
       return self.__db.keys()

    def items(self):
       return self.__db.items()

    def __getitem__(self, key):
        return self.__db.get(key, None)

    def __getattr__(self, key):
        return self.__db.get(key, None)


class DbPool:
   def __init__(self):
       self.__dbs = {}
       self.__default = None

   def __getitem__(self, db_alias):
       if db_alias not in self.__dbs:
           raise Exception('#2')
       return self.__dbs[db_alias]

   def __getattr__(self, key):
       if not self.__default:
          raise Exception('#3')
       return self.__dbs[ self.__default ].__getattr__(key)

   def add(self, db_alias, db_str):
       self.__dbs[db_alias] = Db(db_str)
       if self.__default:
           return
       self.set_default(db_alias)

   def get_default(self):
       return self.__default

   def set_default(self, dbAlias):
       if dbAlias in self.__dbs:
           self.__default = dbAlias
       else:
           raise Exception("#4")

# Settings
DATABASES = {
    'alpha': 'sqlite3:///:memory:',
    'beta': 'sqlite3:///tmp/django.sqlite3',
    'gamma': 'mysql://user1/django',
    'delta': 'mysql://user1:password1/django',
    'default': 'mysql://user1:password1@host1/django',
    'reserv1': 'mysql://user1:password1@host1:1234/django',
    'reserv2': 'postgresql://user1(role1):password1@host1/django',
    'etc': 'postgresql://user1(role1):password1@host1:1234/django',
}
DATABASE_DEFAULT = 'default'
DATABASE_OPTIONS = {}
DATABASE_OPTIONS['etc'] = {
   'ssl':'...'
}
# Setting handlers
pool = DbPool()

for dbase in DATABASES.items():
    pool.add(*dbase)

pool.set_default(DATABASE_DEFAULT)
# Default database
print pool.engine, pool.host, pool.name

# Some other db
print pool['etc'].engine, pool['etc'].host, pool['etc'].name

# Models 
class SomeModel():
    # If not defined
    # meta_connections = [DATABASE_DEFAULT] 
    meta_connections = ['alpha', 'beta']

# API
# instead
# from django.db import connection
# connection.cursor()
#
# use
# from django.db import pool
# pool.link.cursor()
# pool['etc'].link.cursor() 

comment:64 Changed 14 years ago by Alex Gaynor

Owner: changed from darylspitzer to Alex Gaynor
Status: assignednew
Triage Stage: Someday/MaybeAccepted

This ticket is accepted as a part of the 2009 GSOC.

comment:65 Changed 14 years ago by kimus

I made a Multiple database manager for django-blocks see: http://django-blocks.googlecode.com/svn/trunk/src/python/blocks/apps/core/managers.py

just need to do something like this to your model:

    from django.db import models from blocks.apps.core.managers import MultiDBManager

    class SomeModel(models.Model):
        code = models.IntegerField(primary_key=True)
        name = models.CharField(max_length=250)

        objects = MultiDBManager()

        class Meta:
            db_name = 'oracle'

comment:66 Changed 14 years ago by Alexander Koshelev

Cc: Alexander Koshelev added

comment:67 Changed 14 years ago by _wilane

Cc: _wilane added

comment:68 Changed 14 years ago by anonymous

Cc: daemondazz added

comment:69 Changed 14 years ago by eengbrec

Cc: eengbrec added

comment:70 Changed 14 years ago by eengbrec

Cc: erik.engbrecht@… added; eengbrec removed

comment:71 Changed 14 years ago by anonymous

Cc: S2nek added

comment:72 in reply to:  66 ; Changed 14 years ago by nicktank

Replying to alexkoshelev:

I've implemented this on a development version of django 1.1. It appears as though it appends the table name of the app in the database established in the project settings instead of adding the model's tables to the declared database

from django.db import models 
from django.blocks.apps.core.managers import MultiDBManager

class FireMap(models.Model):
    address = models.CharField(max_length=100)
    city = models.CharField(max_length=50)    
    zip = models.IntegerField()
    latitude = models.DecimalField(max_digits=17, decimal_places=14)
    longitude = models.DecimalField(max_digits=17, decimal_places=14)
    neighborhood = models.CharField(max_length=30)
    assesorID = models.IntegerField()
    assessorURL = models.URLField()
    homeValue = models.CommaSeparatedIntegerField(max_length=8)
    homeOwner = models.CharField (max_length=30)
    mainIMGURL = models.URLField()
    photos = models.CommaSeparatedIntegerField(max_length=40)    
    articleID1 = models.IntegerField()
    articleID2 = models.IntegerField()
    articleID3 = models.IntegerField()
    extrainfo = models.TextField()
    
    objects = MultiDBManager()
    
    def __str__(self):
        return self.name

    class Meta:
        db_name = 'firemap'

the setting for the Db in my project settings file is set to:

DATABASE_ENGINE = 'postgresql_psycopg2'           # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
DATABASE_NAME = 'newsok'             # Or path to database file if using sqlite3.

When the tables are created I see in the newsok Db a firemap_firemap table (firemap is the name of the app as well) but there is nothing in the actual firemap Db.

I apologize if this isn't the place to report this issue. I am new around here and to django in general.

comment:73 in reply to:  72 Changed 14 years ago by Russell Keith-Magee

Replying to nicktank:

Replying to alexkoshelev:

I've implemented this on a development version of django 1.1.

...

I apologize if this isn't the place to report this issue. I am new around here and to django in general.

Nick - as noted in comment 64, Multiple database connections was the subject of a 2009 Google Summer of Code project. Alex Gaynor has been developing the code to implement this. It's not a minor change - it requires lots of changes throughout Django. Check out Alex's GitHub repository if you want to see the progress he has made. It isn't quite ready for trunk yet, but it's getting close, and it's sufficiently functional that you can try it out.

comment:74 Changed 14 years ago by Alex Gaynor

If you want to see the current state of work please use my branch in soc2009/multidb in the Django repository, right now my GitHub repo is undergoing severe code alterations that make it high unstable.

comment:75 Changed 14 years ago by xdrum

Cc: swapon@… added

comment:76 Changed 14 years ago by bioform

Cc: andrew.krasnoff@… added

comment:77 Changed 14 years ago by anonymous

Cc: david@… added

comment:78 Changed 14 years ago by Robin Breathe

Cc: Robin Breathe added

comment:79 Changed 14 years ago by Alex Gaynor

Resolution: fixed
Status: newclosed

Fixed in r11952 :D

comment:80 Changed 14 years ago by Jay Hargis

Cc: erik.engbrecht@… gary.wilson@… farcepest@… eikke@… peter.kese@… chris.mcavoy@… ferringb@… iacobs@… andrewjwilkinson@… email@… daryl.spitzer@… remco@… mzdaniel@… rik24d@… stuartw@… jholloway7+django@… simeonf+django@… knyght+django@… spoof@… django@… flori@… tarkatronic@… alexandru.palade@… Jay Hargis lao Alexander Koshelev _wilane daemondazz S2nek swapon@… andrew.krasnoff@… david@… Robin Breathe removed

I know posts should be focused on ticket problems or solutions, but I can't help myself.

thank you so much! what a great Christmas gift!

Note: See TracTickets for help on using tickets.
Back to Top