Changes between Version 3 and Version 4 of NoSqlSupport
- Timestamp:
- Dec 10, 2010, 1:30:09 PM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
NoSqlSupport
v3 v4 1 This page documents the requirements for supporting NoSQL (or non-relational) databases with Django.1 This wiki page documents the requirements for supporting NoSQL (or non-relational) databases with Django. 2 2 3 The [http://www.allbuttonspressed.com/projects/django-nonrel Django-nonrel] branch of Django already provide support for NoSQL and it requires only minimal changes to Django's ORM. However, for the more interesting features like `select_related()` Django's ORM needs to be refactored and simplified in several areas. This wiki page describes the required changes and the current limitations of Django-nonrel.3 The [http://www.allbuttonspressed.com/projects/django-nonrel Django-nonrel] branch of Django already provide support for NoSQL and it requires only minimal changes to Django's ORM. However, for the more interesting features like `select_related()` Django's ORM needs to be refactored and simplified in several areas. Many of the sections in this page are described from the point of view of Django-nonrel since a lot of experience required for official NoSQL support has been integrated in the Django-nonrel project. 4 4 5 For the record, Django-nonrel has several backends:5 For the record, Django-nonrel has quite a few backends, already: 6 6 7 7 * App Engine: [http://www.allbuttonspressed.com/projects/djangoappengine djangoappengine] … … 9 9 * !ElasticSearch: [https://github.com/aparo/django-elasticsearch django-elasticsearch] 10 10 * Cassandra: [https://github.com/vaterlaus/django_cassandra_backend django_cassandra_backend] 11 12 Also take a look at the [http://djangopackages.com/grids/g/nosql/ feature comparison matrix] for an overview of what is supported and what is missing. Database-specific features are sometimes provided by an automatically added manager. For example, MongoDB adds a manager which adds map-reduce and other MongoDB-specific features. 13 14 = Minor issues = 15 16 The default ordering on permissions requires JOINs. This makes them unusable on NoSQL DBs. 17 18 The permission creation code uses an `__in` lookup with too many values. App Engine can only handle 30 values (except for the primary key which can handle 500). This could be worked around, but the limitation was added for efficiency reasons (`__in` lookups are converted into a set of queries that are executed in parallel and then de-duplicated). Thus, it's not really a solution to just run multiple of those queries. Instead, the permission creation code should just fetch all permissions at once. Maybe in a later App Engine release this limitation will be removed when App Engine's new query mechanism goes live (which supports `OR` queries and gets rid of several other limitations). 11 19 12 20 = Representing result rows = … … 36 44 Django implements this in a way that requires JOINs, so this doesn't work on non-relational DBs. Still, this feature should be supported by NoSQL backends. Django needs to provide an easier format for NoSQL backends and the result value should also be simplified, as described above in "Representing result rows". 37 45 38 Django-nonrel merely provides a `connection.feature.supports_select_related` flag which tells `QuerySet` that the backend won't return additional data for the related data in the result rows (otherwise `select_related()` causes bad results full of `None` values). 46 Django-nonrel merely provides a `connection.feature.supports_select_related` flag which tells `QuerySet` that the backend won't return additional data for the related data in the result rows (otherwise `select_related()` causes bad results full of `None` values). All NoSQL backends set this flag to `False`. 39 47 40 = AutoField = 48 = Query refactoring = 49 50 The following is non-critical in that even without the changes it's possible to write NoSQL backends. It's mentioned here in case the Django teams wants to clean the ORM up before adding NoSQL support. 51 52 Currently, `sql.Query` stores data in a format that is too SQL-specific. This is not a show-stopper. It's possible to read the data and handle it somehow. It's just not very convenient. The data should be stored in a more abstract way, probably like Alex Gaynor originally suggested for his Google Summer of Code project. 53 54 For example, JOIN aliases can be simple integers. There's also no need for all of the JOIN-related data structures. Also, instead of storing table and column names it's easier to deal with higher-level information like models and fields in these structures. 55 56 Another example is the way aggregates are represented. The data structures rely too heavily on SQL. 57 58 = !AutoField = 41 59 42 60 In some DB systems the primary key is a string. Currently, `AutoField` assumes that it's always an Integer. … … 44 62 Implementing an auto-increment field in SimpleDB would be extremely difficult. I would say impossible, actually. The eventual consistency model just doesn't support it. For the persistence layers I have written on top of SimpleDB, I use a UUID (type 4) as the ID of the object. --garnaat 45 63 46 Conclusion: Portable code should never assume that the "pk" field is a number. If an entity uses a string pk the application should continue to work. 64 Conclusion: Portable code should never assume that the "pk" field is a number. If an entity uses a string pk the application should continue to work. This is currently a problem in Django's auth app in 1.3 trunk (see #14881). 47 65 48 66 This is already implemented in Django-nonrel. 49 67 50 = ListField =68 = !ListField = 51 69 52 70 NoSQL DBs use `ListField` in a lot of places. They are basically a replacement for `ManyToManyField`. BTW, some SQL DBs have a special array type which could also be supported via `ListField`. … … 54 72 This is already implemented in Django-nonrel. 55 73 56 = SetField =74 = !SetField = 57 75 58 76 Another useful type is `SetField` which stores a set instead of a list. On DBs that don't support sets this field can be emulated by storing a list, instead. This is the approach taken by Django-nonrel's App Engine backend. … … 60 78 This is already implemented in Django-nonrel. 61 79 62 = DictField =80 = !DictField = 63 81 64 82 MongoDB and other databases use `ListField` in combination with `DictField` to completely replace `ManyToManyField` in a lot of cases. Django currently doesn't provide an API for querying the data within a `DictField` (especially if it's embedded in a `ListField`). Ideally, the query API would just use the `foo__bar` JOIN syntax. … … 66 84 The field is already implemented in Django-nonrel, but lookups aren't supported, yet. 67 85 68 = EmbeddedModelField =86 = !EmbeddedModelField = 69 87 70 88 This is a field which stores model instances like a "sub-table within a field". Internally, it's just a `DictField` which converts model instances to/from dicts. In addition to the `DictField` issues this field also has to call the embedded fields' conversion functions, which again requires special support if the JOIN syntax should be supported. 71 89 72 90 The field is already implemented in Django-nonrel, but lookups aren't supported, yet. 91 92 = !BlobField = 93 94 Many databases provide support for a raw binary data type. Many App Engine developers depend on this field to store file-like data because App Engine doesn't provide write access to the file system (there is a new Blobstore API, but that doesn't yet allow direct write access). 95 96 This is already implemented in Django-nonrel. 97 98 = !ImageField = 99 100 Currently, !ImageField depends on PIL. It might be necessary to provide a backend API for sandboxed platforms (like App Engine) that don't provide PIL support. 101 102 This is not implemented in Django-nonrel. 103 104 = Batch operations = 105 106 For optimization purposes it's very important to allow batch-saving and batch-deleting a list of model instances (which, in the case of batch-deletion, is not exactly the same as `QuerySet.delete()` which first has to fetch the entities from the DB in order to delete them). 107 108 This is not implemented in Django-nonrel. 73 109 74 110 = Multi-table inheritance =