Version 17 (modified by Philippe Raoult, 16 years ago) ( diff )

--

Part of DjangoSpecifications

Singleton Instances

This page describes the issues and proposals pertaining to the fact that django's ORM will create as many instances of the same DB object as there are requests.

Issues

Inconsistency

#5514 describes what I think is the main issue with the current behavior of the ORM. Having multiple instances can result in silent data losses. Even worse the actual result would depend on whether one used select_related or not:

class Operand(models.Model):
    value = models.IntField()

class Operation(models.Model):
    arg = models.ForeignKey(Operand)

for operation in Operation.objects.all():
    operation.arg.value += 1
    operation.arg.save()

# so far so good, but I want to make it tighter ! Let's use select_related to save a DB query per iteration
for operation in Operation.objects.all().select_related():
    operation.arg.value += 1
    operation.arg.save()

Here we have a problem if two operations point to the same operand. The first version works well but the second preloads all operands from the DB and the same DB operand will result in multiple operand instances in memory. So we're basically modifying and saving the original operand every time instead of cumulating the changes. Here's a band-aid for your foot.

Models with self FKs will exhibit similar issues:

class Directory(models.Model):
   name = models.CharField()
   parent = models.ForeignKey('self')

for dir in Directory.objects.all():
    if condition(dir):
        dir.modify()
        dir.parent.modify()
        dir.parent.save()

Now let's see, what if condition returns True for a directory and its parent ? If the parent comes first in the main query, it will be modified, saved, reloaded when its child comes up later. If the child comes up first the parent is loaded, modified, saved, and then later on the original value from the outer query will be modified and saved, thus erasing the first change.

Performance

The original goal of #17 was reducing memory use and query count by reusing the same instance instead of having a new one. Let's see a typical example of that:

class ArticleType(models.Model):
    name = models.CharField(maxlength=200)
    categories = models.CharField(maxlength=200)

class Article(models.Model):
    title = models.CharField(maxlength=200)
    type_of_article = models.ForeignKey(ArticleType)

    def __unicode__(self):
        return u"%s (%s)" % (self.title, self.type_of_article.name)


Context({'articles':  Article.objects.filter(type=1)})

{% for article in articles %}
{{ article }}
{% endfor %}

In the above example, type_of_article should only be queried for once, because it's already in memory. This is a very common use case for many developers.

If you have a great number of articles and a smaller number of articletypes the performance/memory hit is staggering because:

  • you generate a DB query per article to get the type
  • you instantiate the type once per article instead of once per type
  • you have as many articletype instances in memory as there are articles

Proposals

Overview

The basic idea is to simply reuse existing instances. This works by keeping track of instantiated objects on a per model basis. Whenever we try to instantiate a model, we check if there is already an instance in memory and reuse it if so. This would solve both issues mentioned above.

Please note that the proposal is absolutely NOT a caching system: whenever an instance goes out of scope it is discarded and will have to be reloaded from the DB next time.

This feature would be turned off by default, and only operate for Models which have this feature explicitly enabled. Good candidates for this feature are Models like the articletype above, or models with self references.

Threads

No sharing would occur between threads, as is the case currently. Instances will be unique within each thread.

API

The public API for this feature will be very simple. It will be a single optional parameter in the Meta class:

class ArticleType(models.Model):
    name = models.CharField(maxlength=200)
    categories = models.CharField(maxlength=200)

    class Meta:
        singleton_instances = True

The default value for the parameter is False, which means that the feature has to be explicitly enabled.

Implementation

#17 currently has a working patch with the following:

  • Per-model WeakValueDictionary with dict[pk_val] = instance.
  • call classmethod which either gets a instance from the dict or return a newly created one.
  • a simple interface for enabling or disabling the feature for a given Model (default is disabled).
  • a slightly modified related manager which will first try to get the instance from the Model dict instead of blindly querying the DB.
  • slightly modified serializers to force the instantiation when reading from an outside source.
  • a slightly modified query to force flushing instances when they are delete'd from the DB.
  • tests!

but some things are missing:

  • In order to properly handle shared instances, partial models, or modified models (such as those with extra select clauses) need to be wrapped in a class which will handle those modifications.
  • there is no doc as to how to enable/disable this feature and what to expect from it.
  • the API for enabling/disabling the feature is very crude and consists only of two class methods. It should at the very least be possible to set the value as a member of the Meta class.
  • the Model dict should be set threadlocally.
  • it should be possible to force instantiation, because the serializers want that.
  • For the sake of completeness, it should be possible to do Model.objects.get_from_db(id=something) to bypass the singleton.

Discussion

The current patch design overrides the call method for the model base which means it automatically returns singleton instances when available. This is convenient because XXX (David or Brian: feel free to fill this out).

Another approach would be to make this explicit and have a simple API along the lines of:

class ModelBase:
    def get_singleton(pk, kwargs = None):
        """ returns the current singleton for the instance corresponding to pk. If there isn't one in memory and kwargs is specified, it will be built from kwargs  """
    

That means writing Model(kwargs) would return a freshly instantiated object (as is the case now) and bypass the singleton instance. Since Django's ORM would use the API internally, doing Model.objects.get would (by default, overridable in the manager) return the singleton. The default related manager would also have a new get_from_db method to force retrieving and instantiating from the DB only.

(The Original Proposal)

This would always be enabled. You cannot turn it off. If stuff is broken by having this on then it should be broken.

This can cause problems with serializers, and things that modify querysets and shouldn't (e.g. .extra). That is the only reason it should be turned off. If those problems are addressed there is no reason to use #17 and to have it enabled 100% of the time.

A simple workaround for querysets issues, and this would branch into being able to store .values() as partials as well (this code is untested, but should give a general idea):

class SphinxObjectInstance(object):
    """
    Acts exactly like a normal instance of an object except that
    it will handle any special sphinx attributes in a __sphinx class.
    """
    _reserved_attributes = ('_sphinx', '__instance')
    
    def __init__(self, instance, attributes={}):
        object.__setattr__(self, '__instance', instance)
        object.__setattr__(self, '_sphinx', SphinxAttributes(attributes))
    
    def __setattr__(self, name, value):
        if name in _reserved_attributes:
            return object.__setattr__(self, name, value)
        return setattr(self.instance, name, value)

    def __getattr__(self, name, value=None):
        if name in _reserved_attributes:
            return object.__getattr__(self, name, value)
        return getattr(self.instance, name, value)

The proposal would also change how extra is used, it should probably be instance._extra or something similar.

(The code above is taken from an unpublished update to the django-sphinx package.)

As for serializers, these should probably bypass the instance caching, as they can have stale data available, better solutions are up to you.

Note: See TracWiki for help on using the wiki.
Back to Top