Opened 10 years ago

Closed 6 years ago

Last modified 6 years ago

#1142 closed enhancement (fixed)

Support for multiple database connections

Reported by: Simon Willison Owned by: Alex
Component: Database layer (models, ORM) Version:
Severity: normal Keywords: multiple database feature
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

Django currently assumes that all models will be stored in a single database, and that only one database connection will be used for the duration of a request. This assumption does not scale to really large applications, where it is common for multiple database connections to be used in non-obvious ways.

Three examples include:

  1. Traditional replication, where all writes go to a single master database while reads are distributed across a number of slave databases.
  2. Sharding, where (for example) user accounts 1-1000 live on db1, 1001-2000 live on db2 etc.
  3. Different types of data live on different servers / clusters. Frequently accessed user profile data might be stored on a seperate database/cluster from log data which is frequently written but very rarely accessed.

At the very least, Django needs to allow more than one database connection to be maintained by the DB wrapper. The default should still be a single connection as this is the common case, but Django should not get in the way should multiple connections be desired.

Rather than having a single connection, how about maintaining a dictionary of connections? A "default" key could correspond to the connection that is used in most cases, but other connections can be configured in the settings file and assigned names. There would need to be a mechanism somewhere for Django model classes to be told which named connection they should use.

Simple replication may end up being a different issue entirely - it's possible that could be handled just with a custom DB backend that knows to send writes to one server and distribute reads across several others. The above change (where Django allows multiple DB connections) is still essential for more complex configurations.

More about this on the mailing list: http://groups.google.com/group/django-users/browse_thread/thread/d08f4975da831bb/75ede5e9c7f0db53

Attachments (1)

py23_set_fix_28feb07.diff (570 bytes) - added by afarnham@… 9 years ago.
Fixes mulitple-db-support branch django/db/models/manager.py for Python 2.3

Download all attachments as: .zip

Change History (81)

comment:1 Changed 10 years ago by ian@…

An Alternative to this would be to use something like CJDBC/Sequoia (https://forge.continuent.org/projects/sequoia/) and manage the clustering at a lower level to django itself.

the link is for a JDBC wrapper, but they also have a C++ library which might be useful. all that is needed is a python module ;-)

regards
Ian

comment:2 Changed 10 years ago by Simon Willison

My limited experience with really large apps suggests that handling clustering at a lower level won't provide enough flexibility. For huge sites you end up having to take different scaling approaches for different bits of functionality - you might have one web service call (or RSS feed) that is hit more than anything else and needs to be scaled in a different way for example. You end up needing to scale different parts of the application in different ways, often using different databases for different parts of the app.

There are also other use-cases for multiple database connections outside of scaling - talking to two legacy applications at once for example.

comment:3 Changed 10 years ago by Kozmo Kid

Django may benefit from the ability to define connections at the application level that can override the sitewide setting. This would make it rather trivial to pull data in from multiple platforms.

An interface to SQLrelay (http://sqlrelay.sourceforge.net/) might help the scalability issue.

comment:4 Changed 10 years ago by Morten Bagai

Simon,

Per your recommendation here I'm just adding a brief description of the scenario in which we typically encounter this current limitation of Django:

In a mixed-database environment, We're typically faced with having to model data types hosted on different database servers. For example, web content may generally live on our MySQL servers, but some applications will need to incorporate invoice data from a MS SQL database. While it's certainly a general mess to have data spread out over multiple engines like this, I think it's also a fact that many developers (especially in small-medium corporate environments) are faced with, and not having the ability to easily manage data from different source within a Django app is going to be a serious limitation to these people.

The solutions of either being able to define a dictionary of database connections in the sitewide config file or being able to specify database connections on an application level, could both work well. I wonder, though, what exactly would be entailed in defining connections on the application level? Would applications generally be able to add their own settings file in which sitewide preferences could be overwritten, or would there be a specific module, like myapp.db.connections for specifying app-level connections? Maybe the simplest and most backward-compatible approach would be to keep a dictionary of named connections available to installed apps in the sitewide settings file as originally suggested.

Thanks. /Morten

comment:5 Changed 9 years ago by adrian

  • priority changed from normal to low
  • Severity changed from major to normal

comment:6 Changed 9 years ago by anonymous

I would have to agree that this is a serious limitation.
It is not always possible to restructure existing databases into a single database for use and it would be of great valu to be able to use the django database model to integrate multiple databases at some level.

comment:7 Changed 9 years ago by Go

  • Type enhancement deleted

comment:8 Changed 9 years ago by anonymous

  • Type set to defect

comment:9 Changed 9 years ago by jacob

(In [3198]) Created branch for MultipleDatabaseSupport. Refs #1142.

comment:10 Changed 9 years ago by anonymous

  • Cc gary.wilson@… added

comment:11 Changed 9 years ago by Andy Dustman <farcepest@…>

  • Cc farcepest@… added

comment:12 Changed 9 years ago by anonymous

  • Component changed from Database wrapper to Validators
  • milestone set to Version 0.91
  • priority changed from low to high
  • Severity changed from normal to blocker
  • Type changed from defect to enhancement
  • Version set to new-admin

comment:13 Changed 9 years ago by anonymous

  • Component changed from Validators to django-admin.py
  • milestone Version 0.91 deleted
  • priority changed from high to highest
  • Severity changed from blocker to trivial
  • Type changed from enhancement to task
  • Version changed from new-admin to SVN

comment:14 Changed 9 years ago by hi-world cup

  • Cc hi-world cup added; gary.wilson@… farcepest@… removed
  • Keywords rthml tab space editor js added
  • Summary changed from Support for multiple database connections to hi-world cup

comment:15 Changed 9 years ago by adrian

  • Summary changed from hi-world cup to Support for multiple database connections

comment:16 Changed 9 years ago by anonymous

  • Cc gary.wilson@… farcepest@… added; hi-world cup removed

comment:17 Changed 9 years ago by anonymous

What is current status of this? When can we expect to see this in trunk?

thanks,
Forest

comment:18 Changed 9 years ago by adrian

  • priority changed from highest to normal
  • Severity changed from trivial to normal

comment:19 Changed 9 years ago by anonymous

  • Cc eikke@… added

comment:20 Changed 9 years ago by nmccullough@…

Has there been any progress on this feature? I am in a similar situation. My project needs to be able to query different databases for different pieces of data. Another suggestion would be to be able to over ride the connection settings at the app level.

comment:21 Changed 9 years ago by Gary Wilson <gary.wilson@…>

  • Component changed from django-admin.py to Database wrapper
  • Keywords rthml tab space editor js removed
  • Type changed from task to enhancement

reverted metadata spam.

comment:22 Changed 9 years ago by Simon G. <dev@…>

  • Keywords multiple database added
  • Triage Stage changed from Unreviewed to Design decision needed
  • Version SVN deleted

comment:23 Changed 9 years ago by namiswaan@…

I hope this gets merged into the trunk soon.

comment:24 Changed 9 years ago by dleewo@…

Is multiple database support going to implemented?

At least in my case, I was looking at Django for an application for my company, but it will need to access at least 2 databases. The main one, plus a database from another product over which we have no control and can't change.

We would actually like it to be 3 databases with the 3rd one being for reporting data. We prefer to keep reporting data in a separate database for performance purposes and for scalability, but if absolutely necessary, we would put it in the dsame database as the main data. That still leaves us with at least 2 databases.

We have another app that was done in Java with Hibernate and Hibernate is currently accessing 4 databases from within the same app.

comment:25 Changed 9 years ago by Gary Wilson <gary.wilson@…>

  • Has patch set
  • Needs documentation set
  • Patch needs improvement set
  • Triage Stage changed from Design decision needed to Accepted

According to MultipleDatabaseSupport, the branch is feature complete and seems to be only lacking documentation.

If anyone is using or has tested the multi db branch, please note your experiences here.

Marking patch needs improvement since the latest commit to the branch notes that tests are still failing.

comment:26 Changed 9 years ago by mmccarty@…

Question: Does the patch for this also provide a work-around for keeping the native Django apps tables out of your database?

comment:27 Changed 9 years ago by anonymous

  • Cc peter.kese@… added

Changed 9 years ago by afarnham@…

Fixes mulitple-db-support branch django/db/models/manager.py for Python 2.3

comment:28 Changed 8 years ago by namiswaan@…

  • Has patch unset

It's been a long time since something has happened to this branch.. will it ever get included in the main trunk?

I wish to use this, but i want to stay with the current Django code. Is it possible to check-out the trunk and merge the changes from this branch?

comment:29 Changed 8 years ago by namiswaan@…

  • Has patch set

oops, reverting the hash patch flag i removed

comment:30 Changed 8 years ago by mtredinnick

Please post question like comment 28 to the users list, not this ticket. Ticket comments are for resolving the issue at hand, not seeking genera. information.

comment:31 Changed 8 years ago by robertmlamarca@…

Hi there, I have been following this conversation. I am working with some friends on a research project that requires one, maybe several postgres dbs. It is a bit off-label, but were hoping to make use of some of the Django features.

Multiple Dbs is something long-term we might need. Can someone tell me how to install this patch? I am running latest Django code with Python 2.4

So far, I see some patch online with about four lines of code... My Python skills are somewhere around early intermediate.. so I am not sure if :

  1. if just placing that code in my latest subversion checkout (where the surrounding code looks a bit different than what is there).. constitutes an installation of this patch.

and

  1. how can I use it?

Appreciate your patience and assistance.

Robert

comment:32 Changed 8 years ago by anonymous

  • Cc chris.mcavoy@… added

comment:33 Changed 8 years ago by (removed)

  • Cc ferringb@… added

comment:34 Changed 8 years ago by Forte

  • Cc lindemann@… added; gary.wilson@… farcepest@… eikke@… peter.kese@… chris.mcavoy@… ferringb@… removed
  • Component changed from Database wrapper to Uncategorized
  • Owner changed from adrian to jacob

comment:35 Changed 8 years ago by mtredinnick

  • Cc gary.wilson@… farcepest@… eikke@… peter.kese@… chris.mcavoy@… ferringb@… added; lindemann@… removed
  • Component changed from Uncategorized to Database wrapper
  • Owner changed from jacob to adrian

Reverted spam.

comment:36 Changed 8 years ago by PhiR

  • Keywords feature added

comment:37 Changed 8 years ago by jacob

  • Triage Stage changed from Design decision needed to Someday/Maybe

comment:38 Changed 8 years ago by anonymous

  • Cc iacobs@… added

comment:39 Changed 8 years ago by anonymous

  • Cc andrewjwilkinson@… added

comment:40 Changed 8 years ago by anonymous

  • Cc email@… added

comment:41 Changed 7 years ago by darylspitzer

  • Cc daryl.spitzer@… added

comment:42 Changed 7 years ago by darylspitzer

  • Owner changed from nobody to darylspitzer
  • Status changed from new to assigned

I reviewed an API design with Jacob and Adrian (and others) and will now work on a prototype. Much of the credit for the API goes to Ben Ford (and his code should give me a big headstart).

comment:43 follow-up: Changed 7 years ago by cmcavoy

I'm glad to see someone taking this ticket, it's something I've wanted for a long time. Any chance you can write up your planned API?

comment:44 in reply to: ↑ 43 Changed 7 years ago by darylspitzer

Replying to cmcavoy:

I'm glad to see someone taking this ticket, it's something I've wanted for a long time. Any chance you can write up your planned API?

I'll answer your question on django-developers.

comment:45 Changed 7 years ago by mzdaniel

  • Cc remco@… mzdaniel@… added

comment:46 Changed 7 years ago by anonymous

  • Cc rik24d@… added

comment:47 Changed 7 years ago by anonymous

  • Cc stuartw@… added

comment:48 Changed 7 years ago by Joe Holloway <jholloway7+django@…>

  • Cc jholloway7+django@… added

comment:49 Changed 7 years ago by simeon

  • Cc simeonf+django@… added

comment:50 Changed 7 years ago by Tom Carrick <knyght@…>

  • Cc knyght+django@… added

comment:51 Changed 7 years ago by anonymous

  • Cc spoof@… added

comment:52 Changed 7 years ago by Koen Biermans <koen.biermans@…>

Ben Ford set up a mercurial repository for new work on multiple databases at http://hg.woe-beti.de. A track setup is available at http://trac.woe-beti.de.

comment:53 Changed 7 years ago by darylspitzer

See this django-developers thread for more discussion (including my API proposal): http://tinyurl.com/5q7oy5

comment:54 Changed 7 years ago by mrts

  • milestone set to post-1.0

Not in scope for 1.0.

comment:55 Changed 7 years ago by paulforrester

  • Cc django@… added

comment:56 Changed 7 years ago by flosch

  • Cc flori@… added

comment:57 Changed 7 years ago by Tarken

  • Cc tarkatronic@… added

What is the current status on this issue? There doesn't appear to have been any visible movement in several months now, and this is something which would be very useful in a project I'm working on. I have a slightly different use case than the previously mentioned ones, however... and I'm sure it's not an entirely common one, but perhaps something to take into consideration.

The company I work for is sitting on a fairly large data warehouse for a number of clients. Each client has their own database, with a common schema, on one of two MS SQL 2k servers. In the near future, we will hopefully be moving some of these clients onto a PostgreSQL server (and in the distant future, it would be nice to move towards Greenplum; although that should be transparent with psycopg2, afaik). In addition, there is a separate database already running on the PostgreSQL server, which stores configuration parameters and which I am using for the Django system tables.

On every page load, I need Django to connect to database 'A' on the PostgreSQL server for any Django specific operations, then connect to an arbitrary database on either of the MS SQL servers, depending on which of our clients' data is currently being viewed. From what I've seen so far in the discussion in this ticket and on MultipleDatabaseSupport, this doesn't sound like it would likely be a supported scenario, as the connection to be used is determined dynamically, rather than a static connection chosen per model. In my case, all models apply across all database connections.

Now, I have made this work thus far with my own connection manager (http://dpaste.com/hold/84458/) but using it is somewhat kludgy and I'm sure it's far from a full implementation. I also had to apply a hack (http://code.google.com/p/django-pyodbc/issues/detail?id=18) to django-pyodbc and apply a pending patch (#6710) to Django in order to get it to work. Here is an example of why it in use: http://dpaste.com/hold/84465/

comment:58 Changed 7 years ago by mtredinnick

There has, in fact, been lots of activity. Perhaps you might wish to peruse the archives of django-developers, which is where design work takes place. This ticket is really just a placeholder that will be closed when we commit some kind of final solution (and possibly for patches as we get closer if the work is self-contained enough not to require a branch).

comment:59 Changed 7 years ago by anonymous

  • Cc alexandru.palade@… added

comment:60 Changed 7 years ago by anonymous

  • Cc jb0t added

comment:61 Changed 7 years ago by anonymous

  • milestone post-1.0 deleted

Milestone post-1.0 deleted

comment:62 Changed 6 years ago by lao

  • Cc lao added

comment:63 Changed 6 years ago by lao

(idea #10492)

My look is

# Wrapper classes
from re import compile

class Db:
    def __init__(self, cstr):
        self.link = None
        __re_dbstr = compile(r'^(?P<engine>[^:]+)://((?P<user>[^\(:]+)(\((?P<role>.+)\))?(:(?P<password>[^@]+))?(@(?P<host>[^\/:]+)(:(?P<port>\d+))?)?)?/(?P<name>.+)')
        try:
            self.__db = __re_dbstr.search(cstr).groupdict()
            # Fix for sqlite
            if self.__db['engine'].startswith('sqlite'):
                self.__db['name'] = "/%s" % self.__db['name']
        except:
            self.__db = {}
            raise Exception("#1")
        self.connect()

    def connect(self):
        self.link = 'Connection link'   # ....

    def keys(self):
       return self.__db.keys()

    def items(self):
       return self.__db.items()

    def __getitem__(self, key):
        return self.__db.get(key, None)

    def __getattr__(self, key):
        return self.__db.get(key, None)


class DbPool:
   def __init__(self):
       self.__dbs = {}
       self.__default = None

   def __getitem__(self, db_alias):
       if db_alias not in self.__dbs:
           raise Exception('#2')
       return self.__dbs[db_alias]

   def __getattr__(self, key):
       if not self.__default:
          raise Exception('#3')
       return self.__dbs[ self.__default ].__getattr__(key)

   def add(self, db_alias, db_str):
       self.__dbs[db_alias] = Db(db_str)
       if self.__default:
           return
       self.set_default(db_alias)

   def get_default(self):
       return self.__default

   def set_default(self, dbAlias):
       if dbAlias in self.__dbs:
           self.__default = dbAlias
       else:
           raise Exception("#4")

# Settings
DATABASES = {
    'alpha': 'sqlite3:///:memory:',
    'beta': 'sqlite3:///tmp/django.sqlite3',
    'gamma': 'mysql://user1/django',
    'delta': 'mysql://user1:password1/django',
    'default': 'mysql://user1:password1@host1/django',
    'reserv1': 'mysql://user1:password1@host1:1234/django',
    'reserv2': 'postgresql://user1(role1):password1@host1/django',
    'etc': 'postgresql://user1(role1):password1@host1:1234/django',
}
DATABASE_DEFAULT = 'default'
DATABASE_OPTIONS = {}
DATABASE_OPTIONS['etc'] = {
   'ssl':'...'
}
# Setting handlers
pool = DbPool()

for dbase in DATABASES.items():
    pool.add(*dbase)

pool.set_default(DATABASE_DEFAULT)
# Default database
print pool.engine, pool.host, pool.name

# Some other db
print pool['etc'].engine, pool['etc'].host, pool['etc'].name

# Models 
class SomeModel():
    # If not defined
    # meta_connections = [DATABASE_DEFAULT] 
    meta_connections = ['alpha', 'beta']

# API
# instead
# from django.db import connection
# connection.cursor()
#
# use
# from django.db import pool
# pool.link.cursor()
# pool['etc'].link.cursor() 

comment:64 Changed 6 years ago by Alex

  • Owner changed from darylspitzer to Alex
  • Status changed from assigned to new
  • Triage Stage changed from Someday/Maybe to Accepted

This ticket is accepted as a part of the 2009 GSOC.

comment:65 Changed 6 years ago by kimus

I made a Multiple database manager for django-blocks see: http://django-blocks.googlecode.com/svn/trunk/src/python/blocks/apps/core/managers.py

just need to do something like this to your model:

    from django.db import models from blocks.apps.core.managers import MultiDBManager

    class SomeModel(models.Model):
        code = models.IntegerField(primary_key=True)
        name = models.CharField(max_length=250)

        objects = MultiDBManager()

        class Meta:
            db_name = 'oracle'

comment:66 follow-up: Changed 6 years ago by alexkoshelev

  • Cc alexkoshelev added

comment:67 Changed 6 years ago by _wilane

  • Cc _wilane added

comment:68 Changed 6 years ago by anonymous

  • Cc daemondazz added

comment:69 Changed 6 years ago by eengbrec

  • Cc eengbrec added

comment:70 Changed 6 years ago by eengbrec

  • Cc erik.engbrecht@… added; eengbrec removed

comment:71 Changed 6 years ago by anonymous

  • Cc S2nek added

comment:72 in reply to: ↑ 66 ; follow-up: Changed 6 years ago by nicktank

Replying to alexkoshelev:

I've implemented this on a development version of django 1.1. It appears as though it appends the table name of the app in the database established in the project settings instead of adding the model's tables to the declared database

from django.db import models 
from django.blocks.apps.core.managers import MultiDBManager

class FireMap(models.Model):
    address = models.CharField(max_length=100)
    city = models.CharField(max_length=50)    
    zip = models.IntegerField()
    latitude = models.DecimalField(max_digits=17, decimal_places=14)
    longitude = models.DecimalField(max_digits=17, decimal_places=14)
    neighborhood = models.CharField(max_length=30)
    assesorID = models.IntegerField()
    assessorURL = models.URLField()
    homeValue = models.CommaSeparatedIntegerField(max_length=8)
    homeOwner = models.CharField (max_length=30)
    mainIMGURL = models.URLField()
    photos = models.CommaSeparatedIntegerField(max_length=40)    
    articleID1 = models.IntegerField()
    articleID2 = models.IntegerField()
    articleID3 = models.IntegerField()
    extrainfo = models.TextField()
    
    objects = MultiDBManager()
    
    def __str__(self):
        return self.name

    class Meta:
        db_name = 'firemap'

the setting for the Db in my project settings file is set to:

DATABASE_ENGINE = 'postgresql_psycopg2'           # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
DATABASE_NAME = 'newsok'             # Or path to database file if using sqlite3.

When the tables are created I see in the newsok Db a firemap_firemap table (firemap is the name of the app as well) but there is nothing in the actual firemap Db.

I apologize if this isn't the place to report this issue. I am new around here and to django in general.

comment:73 in reply to: ↑ 72 Changed 6 years ago by russellm

Replying to nicktank:

Replying to alexkoshelev:

I've implemented this on a development version of django 1.1.

...

I apologize if this isn't the place to report this issue. I am new around here and to django in general.

Nick - as noted in comment 64, Multiple database connections was the subject of a 2009 Google Summer of Code project. Alex Gaynor has been developing the code to implement this. It's not a minor change - it requires lots of changes throughout Django. Check out Alex's GitHub repository if you want to see the progress he has made. It isn't quite ready for trunk yet, but it's getting close, and it's sufficiently functional that you can try it out.

comment:74 Changed 6 years ago by Alex

If you want to see the current state of work please use my branch in soc2009/multidb in the Django repository, right now my GitHub repo is undergoing severe code alterations that make it high unstable.

comment:75 Changed 6 years ago by xdrum

  • Cc swapon@… added

comment:76 Changed 6 years ago by bioform

  • Cc andrew.krasnoff@… added

comment:77 Changed 6 years ago by anonymous

  • Cc david@… added

comment:78 Changed 6 years ago by isometry

  • Cc isometry added

comment:79 Changed 6 years ago by Alex

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in r11952 :D

comment:80 Changed 6 years ago by jb0t

  • Cc erik.engbrecht@… gary.wilson@… farcepest@… eikke@… peter.kese@… chris.mcavoy@… ferringb@… iacobs@… andrewjwilkinson@… email@… daryl.spitzer@… remco@… mzdaniel@… rik24d@… stuartw@… jholloway7+django@… simeonf+django@… knyght+django@… spoof@… django@… flori@… tarkatronic@… alexandru.palade@… jb0t lao alexkoshelev _wilane daemondazz S2nek swapon@… andrew.krasnoff@… david@… isometry removed

I know posts should be focused on ticket problems or solutions, but I can't help myself.

thank you so much! what a great Christmas gift!

Note: See TracTickets for help on using tickets.
Back to Top