Context Navigation

Changes between Version 17 and Version 18 of NoSqlSupport

Timestamp:: May 5, 2011, 4:30:54 AM (14 years ago)
Author:: Waldemar Kornewald
Comment:: started going into more detail on the required changes

Legend:

: Unmodified
: Added
: Removed
: Modified

NoSqlSupport

-              v17
+              v18
 = Representing result rows =
 `SQLCompiler.results_iter()` currently returns results as simple lists which represent rows. This adds unnecessary complexity to NoSQL backends, especially since they have to map their results (which are dicts) to a specifically ordered list and then Django takes that list and converts it back to a dict which gets passed to the model constructor. The row format is especially inconvenient when combined with `select_related()` because then NoSQL backends have to collect all fields in the correct order and also take deferred fields into account.
+ Problem:: `SQLCompiler.results_iter()` currently returns results as simple lists which represent rows, related selections, annotations, and extra selections. Since the ordering of the lists' entries matters NoSQL backends have to use complex code to map their results (which are dicts) to a specifically ordered list. In the next step, Django converts each list back to a dict which is passed to the model constructor. The row format is especially inconvenient when combined with `select_related()` because then NoSQL backends have to collect all fields in the correct order and also take deferred fields into account. Basically, the current results format is too SQL-specific.
 Instead of returning lists `results_iter()` should return more structured data. For example, each result could be wrapped as a dict like this
+ Solution:: Instead of returning lists, `results_iter()` should return more structured data. For example, each result could be wrapped as a dict like this
 {{{
 …
 }}}
 This is not implemented in Django-nonrel.
+ Implementation status:: This is not implemented in Django-nonrel.
 = select_related() =
 Django implements this in a way that requires JOINs, so this doesn't work on non-relational DBs. Still, this feature should be supported by NoSQL backends. Django needs to provide an easier format for NoSQL backends and the result value should also be simplified, as described above in "Representing result rows".
+ Problem:: Django's internal representation of `select_related()` depends on JOINs, which aren't supported on NoSQL DBs.
+Django-nonrel merely provides a `connection.feature.supports_select_related` flag which tells `QuerySet` that the backend won't return additional data for the related data in the result rows (otherwise `select_related()` causes bad results full of `None` values). All NoSQL backends set this flag to `False`.
+ Solution:: Django needs to provide a simpler internal representation of `select_related()` which allows the backend to easily retrieve the related models and their selected fields (so deferred fields aren't loaded unnecessarily).
+ Implementation status:: Django-nonrel merely provides a `connection.feature.supports_select_related` flag which tells `QuerySet` that the backend won't return additional data for the related data in the result rows (otherwise `select_related()` causes bad results full of `None` values). All NoSQL backends set this flag to `False`.
 = !AutoField =
 In some DB systems the primary key is a string. Currently, `AutoField` assumes that it's always an Integer. There are two ways to support string-based primary keys. Either we can add a `StringAutoField` and require developers to explicitly use that. The disadvantage of this solution is that it becomes impossible to reuse existing Django models and NoSQL models become less portable even across NoSQL databases. The better alternative is to change `AutoField` to support both integers and strings. Since some existing code assumes that an exception is raised when assigning a string to an `AutoField` we could try to detect the installed backends and keep the old behavior (but additionally show a deprecation warning) when only SQL backends are in use. When using a NoSQL backend the new behavior would be activated and `AutoField` would accept both integers and strings without raising an exception.
+ Problem:: Currently, `AutoField` assumes that it's always an integer. However, in several NoSQL DBs (MongoDB, SimpleDB, etc.) the primary key is a string.
 Portable code should never assume that the "pk" field is a number. If an entity uses a string pk the application should continue to work. This is currently a problem in Django's auth app (see #14881).
+ Solution 1:: Add a `StringAutoField` and require developers to explicitly use that. The disadvantage of this solution is that it becomes impossible to reuse existing Django models and NoSQL models become less portable even across NoSQL databases.
+This is already implemented in Django-nonrel, but it's missing the deprecation warning and backwards-compatible mode when only using SQL backends.
+ Solution 2 (preferred):: Change `AutoField` to support both integers and strings. Since some existing code assumes that an exception is raised when assigning a string to an `AutoField` we could detect the installed backends and keep the old behavior (but additionally show a deprecation warning) when only SQL backends are in use. When using a NoSQL backend the new behavior would be activated and `AutoField` would accept both integers and strings without raising an exception.
+ Additional notes:: Portable code should never assume that the "pk" field is a number. If an entity uses a string pk the application should continue to work. This is currently a problem in Django's auth app (see further below).
+ Implemenation status:: This is already implemented in Django-nonrel, but it's missing the deprecation warning and backwards-compatible mode when only using only SQL backends.
 = INSERT vs UPDATE =
 Currently, `Model.save_base()` runs a check whether the pk already exists in the database. This check is necessary for SQL, but it's unnecessary and inefficient on many NoSQL DBs and it also conflicts with App Engine's optimistic transactions. Thus, Django should not distinguish between insert and update operations on DBs that don't require it.
+ Problem:: Currently, `Model.save_base()` runs a check whether the pk already exists in the database. This check is necessary for SQL, but it's unnecessary and inefficient on many NoSQL DBs which have an "upsert" operation that inserts or overwrites the entry in the DB. App Engine also doesn't allow to run queries within (optimistic) transactions, so the current `save()` method doesn't work on App Engine.
 This comes with a minor problem: Without that check model instances have to track whether they were instantiated from the DB and thus exist in the DB or not. Otherwise the `Field.pre_save()` `add` parameter won't work correctly and the `post_save` signal won't report correctly whether this is a new entity or not.
+ Solution:: Django shouldn't distinguish between insert and update operations on DBs that don't require such a distinction. Instead of checking the DB each model instance could get a constructor parameter that tells it whether it represents an existing entity or not.
 This is already implemented in Django-nonrel.
+ Implementation status:: This is already implemented in Django-nonrel, but some of Django's unit tests fail because they assume that the model constructor won't get additional parameters. Maybe an alternative solution is required.
 = count() =
+= Counting =
 `Query.count()` is problematic since a scalable `count()` method doesn't exist at least on App Engine. It would be nice to be able to pass an upper limit like `count(100)`, so if there are more than 100 results it will still return just 100.
+ Problem:: Counting is not a scalable operation on some DBs (esp. App Engine). The more entities you try to count the longer the operation takes. In the worst case it times out. Django's `Query.count()` always tries to count everything instead of just a subset of the results which is very inefficient. For example, sometimes you might only want to know whether there are more than 10 results, but you might not be interested in the exact number of results unless it's less than or equal to 10. In order to work around timeouts for too large counting operations some backends might even artificially limit the maximum count to 1000 (e.g. on App Engine).
 This also affects the results count in the admin interface.
+ Solution:: Allow to pass an upper limit to the count operation. For instance, `queryset.count(100)` would never return a larger number than 100 even if there are more results.
+Django-nonrel's App Engine backend currently just limits the maximum count to 1000. Other backends don't have a `count()` limit.
+ Additional notes:: A related problem is that it might be impossible to retrieve a large number of results (i.e., not just the count, but the actual entities) or even results beyond a certain offset (esp. on App Engine). Since it's impossible to count the whole result set in advance and possibly even iterate through the whole result set this feature affects all apps that do pagination (e.g., the admin interface). In order to allow paginating through the whole result set so-called cursors must be supported (see below).
+ Implementation status:: Django-nonrel's App Engine backend currently just limits the maximum count to 1000. Other backends don't have a `count()` limit, but that might lead to inefficient queries.
 = !ListField =
 …
 Multi-table inheritance requires JOIN support, so this feature can't be fully supported. For convenience it would be nice to allow subclassing a non-abstract model, but only copying its fields as if it were abstract.
+= Auth password reset URLs =
+#14881, [http://code.djangoproject.com/attachment/ticket/14881/django-auth-string-pk-support.patch Patch]
+ Problem:: `django.contrib.auth`'s password reset URLs contain a base36-encoded user ID (`/reset/<user-id>/<token>/`). Several NoSQL backends (MongoDB, SimpleDB, etc.) use string-based primary keys. The password reset feature breaks if the user ID (the primary key) is not an integer (because base36 can only express integers).
+ Solution:: Encode the user ID in a URL-safe variant of base64. This is a backwards-incompatible change that breaks "old-style" password reset URLs, but backwards compatibility should be very easy to implement if required.
+ Implementation status:: This is already implemented in Django-nonrel, but it's not yet backwards-compatible.
 = Minor issues =