Changes between Version 3 and Version 4 of NoSqlSupport


Ignore:
Timestamp:
Dec 10, 2010, 1:30:09 PM (13 years ago)
Author:
Waldemar Kornewald
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • NoSqlSupport

    v3 v4  
    1 This page documents the requirements for supporting NoSQL (or non-relational) databases with Django.
     1This wiki page documents the requirements for supporting NoSQL (or non-relational) databases with Django.
    22
    3 The [http://www.allbuttonspressed.com/projects/django-nonrel Django-nonrel] branch of Django already provide support for NoSQL and it requires only minimal changes to Django's ORM. However, for the more interesting features like `select_related()` Django's ORM needs to be refactored and simplified in several areas. This wiki page describes the required changes and the current limitations of Django-nonrel.
     3The [http://www.allbuttonspressed.com/projects/django-nonrel Django-nonrel] branch of Django already provide support for NoSQL and it requires only minimal changes to Django's ORM. However, for the more interesting features like `select_related()` Django's ORM needs to be refactored and simplified in several areas. Many of the sections in this page are described from the point of view of Django-nonrel since a lot of experience required for official NoSQL support has been integrated in the Django-nonrel project.
    44
    5 For the record, Django-nonrel has several backends:
     5For the record, Django-nonrel has quite a few backends, already:
    66
    77 * App Engine: [http://www.allbuttonspressed.com/projects/djangoappengine djangoappengine]
     
    99 * !ElasticSearch: [https://github.com/aparo/django-elasticsearch django-elasticsearch]
    1010 * Cassandra: [https://github.com/vaterlaus/django_cassandra_backend django_cassandra_backend]
     11
     12Also take a look at the [http://djangopackages.com/grids/g/nosql/ feature comparison matrix] for an overview of what is supported and what is missing. Database-specific features are sometimes provided by an automatically added manager. For example, MongoDB adds a manager which adds map-reduce and other MongoDB-specific features.
     13
     14= Minor issues =
     15
     16The default ordering on permissions requires JOINs. This makes them unusable on NoSQL DBs.
     17
     18The permission creation code uses an `__in` lookup with too many values. App Engine can only handle 30 values (except for the primary key which can handle 500). This could be worked around, but the limitation was added for efficiency reasons (`__in` lookups are converted into a set of queries that are executed in parallel and then de-duplicated). Thus, it's not really a solution to just run multiple of those queries. Instead, the permission creation code should just fetch all permissions at once. Maybe in a later App Engine release this limitation will be removed when App Engine's new query mechanism goes live (which supports `OR` queries and gets rid of several other limitations).
    1119
    1220= Representing result rows =
     
    3644Django implements this in a way that requires JOINs, so this doesn't work on non-relational DBs. Still, this feature should be supported by NoSQL backends. Django needs to provide an easier format for NoSQL backends and the result value should also be simplified, as described above in "Representing result rows".
    3745
    38 Django-nonrel merely provides a `connection.feature.supports_select_related` flag which tells `QuerySet` that the backend won't return additional data for the related data in the result rows (otherwise `select_related()` causes bad results full of `None` values).
     46Django-nonrel merely provides a `connection.feature.supports_select_related` flag which tells `QuerySet` that the backend won't return additional data for the related data in the result rows (otherwise `select_related()` causes bad results full of `None` values). All NoSQL backends set this flag to `False`.
    3947
    40 = AutoField =
     48= Query refactoring =
     49
     50The following is non-critical in that even without the changes it's possible to write NoSQL backends. It's mentioned here in case the Django teams wants to clean the ORM up before adding NoSQL support.
     51
     52Currently, `sql.Query` stores data in a format that is too SQL-specific. This is not a show-stopper. It's possible to read the data and handle it somehow. It's just not very convenient. The data should be stored in a more abstract way, probably like Alex Gaynor originally suggested for his Google Summer of Code project.
     53
     54For example, JOIN aliases can be simple integers. There's also no need for all of the JOIN-related data structures. Also, instead of storing table and column names it's easier to deal with higher-level information like models and fields in these structures.
     55
     56Another example is the way aggregates are represented. The data structures rely too heavily on SQL.
     57
     58= !AutoField =
    4159
    4260In some DB systems the primary key is a string. Currently, `AutoField` assumes that it's always an Integer.
     
    4462Implementing an auto-increment field in SimpleDB would be extremely difficult.  I would say impossible, actually.  The eventual consistency model just doesn't support it.  For the persistence layers I have written on top of SimpleDB, I use a UUID (type 4) as the ID of the object.  --garnaat
    4563
    46 Conclusion: Portable code should never assume that the "pk" field is a number. If an entity uses a string pk the application should continue to work.
     64Conclusion: Portable code should never assume that the "pk" field is a number. If an entity uses a string pk the application should continue to work. This is currently a problem in Django's auth app in 1.3 trunk (see #14881).
    4765
    4866This is already implemented in Django-nonrel.
    4967
    50 = ListField =
     68= !ListField =
    5169
    5270NoSQL DBs use `ListField` in a lot of places. They are basically a replacement for `ManyToManyField`. BTW, some SQL DBs have a special array type which could also be supported via `ListField`.
     
    5472This is already implemented in Django-nonrel.
    5573
    56 = SetField =
     74= !SetField =
    5775
    5876Another useful type is `SetField` which stores a set instead of a list. On DBs that don't support sets this field can be emulated by storing a list, instead. This is the approach taken by Django-nonrel's App Engine backend.
     
    6078This is already implemented in Django-nonrel.
    6179
    62 = DictField =
     80= !DictField =
    6381
    6482MongoDB and other databases use `ListField` in combination with `DictField` to completely replace `ManyToManyField` in a lot of cases. Django currently doesn't provide an API for querying the data within a `DictField` (especially if it's embedded in a `ListField`). Ideally, the query API would just use the `foo__bar` JOIN syntax.
     
    6684The field is already implemented in Django-nonrel, but lookups aren't supported, yet.
    6785
    68 = EmbeddedModelField =
     86= !EmbeddedModelField =
    6987
    7088This is a field which stores model instances like a "sub-table within a field". Internally, it's just a `DictField` which converts model instances to/from dicts. In addition to the `DictField` issues this field also has to call the embedded fields' conversion functions, which again requires special support if the JOIN syntax should be supported.
    7189
    7290The field is already implemented in Django-nonrel, but lookups aren't supported, yet.
     91
     92= !BlobField =
     93
     94Many databases provide support for a raw binary data type. Many App Engine developers depend on this field to store file-like data because App Engine doesn't provide write access to the file system (there is a new Blobstore API, but that doesn't yet allow direct write access).
     95
     96This is already implemented in Django-nonrel.
     97
     98= !ImageField =
     99
     100Currently, !ImageField depends on PIL. It might be necessary to provide a backend API for sandboxed platforms (like App Engine) that don't provide PIL support.
     101
     102This is not implemented in Django-nonrel.
     103
     104= Batch operations =
     105
     106For optimization purposes it's very important to allow batch-saving and batch-deleting a list of model instances (which, in the case of batch-deletion, is not exactly the same as `QuerySet.delete()` which first has to fetch the entities from the DB in order to delete them).
     107
     108This is not implemented in Django-nonrel.
    73109
    74110= Multi-table inheritance =
Back to Top