Changes between Initial Version and Version 1 of AppEngine


Ignore:
Timestamp:
Feb 6, 2009, 8:24:18 AM (16 years ago)
Author:
Waldemar Kornewald
Comment:

copied from app-engine-patch wiki

Legend:

Unmodified
Added
Removed
Modified
  • AppEngine

    v1 v1  
     1#summary This is an initial proposal for the Django team if they consider to port Django, so it's better suited for App Engine.
     2
     3Also see [http://code.djangoproject.com/ticket/10192 ticket 10192] in Django's ticket tracker and the [http://groups.google.com/group/django-developers/browse_thread/thread/516626d18f6f3fca discussion] in django-developers.
     4
     5= Porting Django to App Engine: What's needed/different? =
     6
     7In order to understand this proposal you must have read the [http://code.google.com/appengine/docs/ App Engine documentation]. In order to simplify the port it might be possible to reuse a few parts from [http://code.google.com/p/app-engine-patch/ app-engine-patch].
     8
     9The following might also apply to other cloud hosts which provide special database and communication interfaces.
     10
     11== Reminder: Datastore and request limitations ==
     12
     13You can't have requests that take longer than 10 seconds and you can't retrieve more than 1000 model instances at once from the datastore. It's also impossible to run more than 30 queries without hitting the 10 sec request limit.
     14
     15A single entity (actually: a whole entity group) can't handle more than 5 writes per second (writes = save or delete).
     16
     17Unique properties can only be emulated via keys, but this means they can't be changed afterwards, so this is only useful for primary keys.
     18
     19An entity may not have more than 5000 index entries.
     20
     21All query filter rules are connected via the AND operator. The OR operator is not supported.
     22
     23Transactions can only run on a single entity group and you can't run queries within a transaction.
     24
     25Also not supported:
     26 * JOINs (could be done manually for small datasets)
     27 * sub-queries (ditto)
     28 * DISTINCT queries (i.e., no queryset.dates(), etc.)
     29 * referential integrity
     30
     31== Schemas ==
     32
     33Since tables are flexible and don't have schema definitions running "manage.py syncdb" shouldn't be necessary.
     34
     35== Indexes ==
     36
     37Queries with inequality filters or sort orders need special index rules, so Django features like the admin interface should have a fall-back mode in which you can't sort query results because the developer can hardly define all possible index rules, especially if the searched property is a list property (in which case you need multiple index rules for all possible numbers of search terms).
     38
     39Possibly, when App Engine gets full-text search support there could be a fall-back to (or preference for?) running complex queries on the full-text index.
     40
     41== Keys, key_name, key id, parents ==
     42
     43Django should always assign a key_name to each newly created entity instead of letting App Engine choose a key id, so data can be exported from and imported into the datastore more easily and migrations to other providers become less problematic. A transaction can be used to ensure that no existing entity with the generated key_name gets overwritten.
     44
     45An interface to the underlying key and key id should be provided, too, but it shouldn't be recommended due to its problems.
     46
     47The key_name and parent could be emulated with a !CharField(primary_key=True) that automatically prefixes the given string with a character, internally. If you only need a key_name it's sufficient to specify a string. If you want to also specify a parent you could create a special encoded string by passing the parent and (optionally) the desired key_name to that function and then passing the result to the !CharField. This API would allow for reducing the key_name and parent into a single pk property and staying compatible with existing Django code which wouldn't work if we had separate pk and parent properties.
     48
     49The pk property should return an url-safe string that contains the key_name without the safety prefix (i.e., the value of the !CharField(primary_key=True)) and the parent pk. This is more portable than using the str(Key) because it doesn't contain the model and app name. Moreover, in case you specified a pk manually , the URLs will be much nicer. Even if you don't specify a key_name the URL is still shorter than the str(Key) version.
     50
     51In order to optimize code it's useful to be able to get the pk value of a !ForeignKey without dereferencing its entity.
     52
     53Queries should support ancestor conditions.
     54
     55Every model should provide these properties: key, key_name, key_id, parent, parent_key
     56
     57== Transactions ==
     58
     59Django could emulate transactions with the commit_on_success decorator. Manual transaction handling and checkpoints can't be implemented on App Engine, though.
     60
     61== Datastore batch operations ==
     62
     63Datastore writes are very expensive. App Engine provides batch operations for saving and deleting lots of model instances at once (no more than 500 entries, though). Django should provide such an API, too, so code can be optimized.
     64
     65The API would be most flexible if it worked like a transaction handler where all save() calls within a function call are collected and then committed afterwards. The implementation wouldn't be trivial, though. It requires maintaining a list of (pre-collected) saved instances, so filter() calls also check the pre-collected list.
     66
     67There are batch operations for getting lots of model instances by key. This could be emulated with
     68
     69{{{
     70MyModel.objects.all().filter(pk__in=[key1, key2, ...])
     71}}}
     72
     73== Model relations and JOINs ==
     74
     75Since JOINs don't work, Django should fall back to client-side JOIN emulation by issuing multiple queries. Of course, this only works with small datasets and it's inefficient, but that can be documented. It can still be a useful feature.
     76
     77Many-to-many relations could be emulated with something like a !ListProperty(db.Key), so you can at least issue simple queries, but this can quickly hit the 5000 index entries limit. The alternative of having an intermediate table is useless if you have to issue queries on the data. Anyway, for efficiency it should be possible to retrieve only the pk values without loading the actual entities from the db.
     78
     79== Special field types ==
     80
     81The following field types have to be ported to Django:
     82 * !ListProperty
     83 * !BlobProperty
     84
     85== Zipimport ==
     86
     87Django should work from within a zip package. This means at least extending find_commands(), so manage.py commands can work (app-engine-patch [http://bitbucket.org/wkornewald/django-app-engine/src/tip/core/management/__init__.py already does this]). The media files and templates could be exported from the zip file (like it's currently done in app-engine-patch) if that is more efficient.
     88
     89== manage.py commands ==
     90
     91Not all manage.py commands should be available on App Engine (e.g., the SQL-related commands). This could probably be detected at runtime based on the DB backend's capabilities. Some commands like "runserver" have to be replaced. This could possibly be done by adding an app to INSTALLED_APPS which redefines a few commands.
     92
     93We also need an "official" deployment command to emulate "appcfg.py update" and similar commands for other cloud hosts.
     94
     95== Email support ==
     96
     97In order to support email functionality it must be possible to provide email backends which handle the actual sending process. App Engine has a special [http://code.google.com/appengine/docs/python/mail/ Mail API].
     98
     99== File uploads ==
     100
     101The file upload handling code should never assume that it has access to the file system. Instead, it should assume that the file gets uploaded directly into the datastore or indirectly (e.g., via POST to S3 and then Django just gets notified when the upload is finished). This means that imports of file system functions should be deferred as much as possible.
     102
     103== Permissions and content types ==
     104
     105Since we shouldn't depend on manage.py syncdb, the Permission and !ContentType models should be replaced with dynamically generated fake model instances (which is also an optimization). Since we can retrieve the list of defined models at runtime we can easily generate those two models at runtime, too. Internally, they could be stored as a simple string (e.g., 'user.can_add') and converted into fake models when the field is accessed. This might require creating a !FakeModelField for holding this kind of model.
     106
     107== Future: denormalization ==
     108
     109As an alternative to JOIN emulation, denormalization could be provided via a !ForeignKey that gets told which attributes of the referenced entity have to be copied. The query would then be formulated as if it crossed a relation, but internally the copied data would be used. Of course, with denormalization when an attribute changes Django must update all affected entities referencing that attribute.
     110
     111Data integrity could require modifying more model instances than allowed in a single request. A background process (or cron job) could be used to automatically clean up huge amounts of data inconsistency. This would require creating a cleanup task (maybe as a model) which could at the same time be used to correct inconsistent data on-the-fly. The cache backend could optimize this process.
Back to Top