Version 14 (modified by twanschik, 9 years ago) (diff)


Here we collect the high-level refactorings required for non-sql (or non-relational) backends. Also see the specific backend docs for details: AppEngine

Database query operations (get, count, ... entities) are provided to Django models via managers. Managers basically wrap around QuerySet methods. Internally Django's QuerySet class uses sql.Query whose default is sql.BaseQuery. sql.Query can be overridden to use a custom Query class (the backend). So we end up writting a Query class in order to provide a specific backend. So anything done in Django's Model methods or QuerySet methods which cannot be done for a specific backend should be moved into the Query in order to allow backends to specify what to do in such cases.


The query classes defined in this module are used by QuerySet and Model, but it's not possible to override the subquery classes.

TODO: Is this necessary, at all? Probably yes if you want a clean port.



In some DB systems the primary key is a string. Currently, AutoField assumes that it's always an Integer.


Multi-table inheritance

In the current implementation, multi-table inheritance requires reads and writes across multiple tables. On non-relational DBs this has to be solved differently. For example, with a ListField type you could store all models which you derived from (inspired by App Engine's PolyModel). On App Engine this adds deeper composite indexes which is a problem when filtering against multiple ListFields combining that with inequality filters or results ordering (exploding indexes). Thus, this should only be used at the second inheritance level (seen from Model base class).

Problem: If model B derives from A and you get an A instance and modify it you mustn't lose B's data. Either we always keep all data (which means you never free up data after schema changes unless you use a lower-level API) or we keep track of all derived models' fields and keep them while removing all unused fields (e.g., A would know about B's fields and preserve them when saving). Probably the first solution is the safest.

TODO: How do we store field data that doesn't have a corresponding field in the model definition?


The distinction between insert and update should be done by sql.Query because not all backends make that distinction, at all. The check whether the pk already exists (which is part of the distinction) should be moved out, too, because it would be unnecessarily inefficient on those backends.


model.delete() collects all referenced objects using _collect_sub_objects(). For non-sql backends this is not always possible, for example, when running in a transaction on App Engine (only entities in the same entity group can be fetched from the datastore). This means that we can't guarantee referential integrity and we can't efficiently emulate SQL in this case.


Not all backends support transactions, at all (e.g., SimpleDB). Some (e.g., App Engine) only support transactions similar to "SELECT ... FOR UPDATE" (which isn't exactly the same as @commit_on_success because it really locks items for read/write access). Not all backends support a BEGIN/END TRANSACTION operation, but only provide an interface for calling a function transactionally (like the @commit_on_success decorator).


TODO: Let's wait for App Engine to support its new bookmarks mechanism before implementing this (key-based pagination doesn't work in all situations). (Is there a link to info about app engine bookmarks? I only found one sentence about cursors on the official roadmap.)

On App Engine you can only retrieve the first 1000 query results. There needs to be support for "bookmarks" which mark the next starting point.

On SimpleDB you can directly retrieve the bookmark of the Nth item and run the query from there.

TODO(mitch?): Find out if this is efficient (even for millions of items) or if it's better to provide bookmarks at a higher level.


query.count() will be problematic since a scalable count() method doesn't exist in app engine (does it exist in SimpleDB?). Perhaps this will be alleviated by cursors/bookmarks, but I suspect we'll have to address this some other way. An automatic sharded counter in the manager would allow for an "almost accurate" count in most situations. It would certainly be good enough for the most popular use case, "how many pages of results do I have?"

The sharded counter can only be used for counting a very specific query, so you'd either have to specify all possible queries upfront or manage the counter manually. --wkornewald

Back to Top