Version 7 (modified by Matt Boersma, 17 years ago) ( diff )

Added Oracle limitations of TextFields

The boulder-oracle-sprint branch

This branch began November 4, 2006 at a code sprint in Boulder, Colorado, whence it takes its name. It aims to implement full support for the Oracle RDBMS.

How to get the branch

svn co http://code.djangoproject.com/svn/django/branches/boulder-oracle-sprint/

See our branch policy for full information on how to use a branch.

Goals

The main goals of this branch are:

  • Improve the oracle backend to the point where it can be used interchangeably with the existing supported backends.

Status

A patch against the current trunk is ready and attached to this wiki page. We can always use more help testing. Here's how:

  • Install an Oracle 9i or 10g database. The free Oracle Express Edition is a simple .deb or .rpm package install on most Linuxes, or an .exe installer on Windows.
  • To run "python manage.py syncdb," you'll need to create a user who has CREATE TABLE, CREATE SEQUENCE, and CREATE PROCEDURE privileges. To run Django's test suite, your user also needs CREATE|DROP DATABASE and CREATE|DROP TABLESPACE privileges.
  • Install the cx_Oracle driver so python can talk to your new database. Version 4.3.1 or later of cx_Oracle is preferred.

Then set up your Django settings.py file to include something like this:

DATABASE_ENGINE = 'oracle'
DATABASE_NAME = 'xe'
DATABASE_USER = 'a_user' 
DATABASE_PASSWORD = 'a_password'
DATABASE_HOST = ''
DATABASE_PORT = ''

If you don't have a tnsnames.ora file or a similar Oracle naming method that recognizes the SID ('xe' in this example), then fill in both DATABASE_HOST and DATABASE_PORT like so:

DATABASE_ENGINE = 'oracle'
DATABASE_NAME = 'xe'
DATABASE_USER = 'a_user' 
DATABASE_PASSWORD = 'a_password'
DATABASE_HOST = 'dbprod01ned.mycompany.com'
DATABASE_PORT = '1540'

You should supply both DATABASE_HOST and DATABASE_PORT, or leave both as empty strings.

Tablespace options

A common paradigm for optimizing performance in Oracle-based systems is the use of tablespaces to organize disk layout. The Oracle branch supports this use case by adding db_tablespace options to Meta and Field classes. When using a backend that lacks support for tablespaces, these options are ignored.

A tablespace can be specified for the table(s) generated by a model by supplying the "db_tablespace" option inside the model's Meta class. Additionally, the "db_tablespace" option can be passed to a Field constructor to specify an alternate tablespace for the Field's column index. If no index would be created for the column, the "db_tablespace" option is ignored.

class TablespaceExample(models.Model):
    name = models.CharField(maxlength=30, db_index=True, db_tablespace="indexes")
    data = models.CharField(maxlength=255, db_index=True)
    edges = models.ManyToManyField(to="self", db_tablespace="indexes")

    class Meta:
        db_tablespace = "tables"

In this example, the tables generated by the TablespaceExample model (i.e., the model table and the many-to-many table) would be stored in the "tables" tablespace. The index for the name field and the indexes on the many-to-many table would be stored in the "indexes" tablespace. The "data" field would also generate an index, but no tablespace for it is specified, so it would be stored in the model tablespace "tables" by default.

Naming issues

Oracle imposes a name length limit of 30 characters. To accommodate this, the django.db.backends.util.truncate_name function has been added to truncate names to backend.get_max_name_length(). To prevent name collisions, the final 4 characters of the truncated name are generated from an MD5 hash.

Backend QuerySet classes

A new mechanism is supplied by which backends can supply their own QuerySet subclasses that will be used in place of QuerySet, and that core QuerySet subclasses (such as DateQuerySet) will subclass. The oracle backend uses this to override the iterator and _get_sql_clause methods in order to construct proper SQL for Oracle. It is also used to provide a new resolve_columns method that the iterator method is expected to call in order to fetch corrected values.

Empty strings

Django generally prefers to use the empty string ('') rather than NULL, but Oracle treats the empty string as NULL. To get around this, the oracle backend silently changes '' to ' ' when saving data. The inverse operation is automatically performed by OracleQuerySet.resolve_columns when fetching from the database.

Datetime conversion

cx_Oracle always returns either datetime.datetime objects or its own Timestamp objects when fetching date or time-related fields. The OracleQuerySet.resolve_columns method is used to cast values from the database to the proper type for the Field.

TextField limitations

The oracle backend stores TextFields as NCLOB columns, for which cx_Oracle requires some extra manipulation to fetch. This is automatically handled by OracleQuerySet.resolve_columns. Oracle also poses some limitations on the usage of LOB columns:

  • LOB columns may not be used as primary keys.
  • LOB columns may not be used in indexes.
  • LOB columns may not be used in a "SELECT DISTINCT" list. This means that attempting to use the QuerySet.distinct method on a model that includes TextField columns will result in an error. A workaround to this is to keep TextFields out of any models that you foresee performing .distinct queries on, and to include the TextFields in a related model instead.

To-do list

(Updated April 19, 2007)

  • The branch could use additional testing.
  • Additional code refactoring for the QuerySet.iterator and QuerySet._get_sql_clause methods would be nice.

Attachments (4)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.
Back to Top