id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
14697	Speeding up queryset model instance creation	Anssi Kääriäinen	nobody	"The attached patch does some easy optimizations to speed up iteration and thus model instance creation when using querysets.

The tests are run using the following code:

{{{
models.py:

from django.db import models

class Test1(models.Model):
    pass

# Create your models here.
class Test2(models.Model):
    field1 = models.CharField(max_length=20)
    field2 = models.ForeignKey(Test1)
    field3 = models.CharField(max_length=20)
    field4 = models.CharField(max_length=20)
    field5 = models.CharField(max_length=20)
    field6 = models.CharField(max_length=20)
    field7 = models.CharField(max_length=20)
    field8 = models.CharField(max_length=20)
    field9 = models.CharField(max_length=20)
    field10 = models.CharField(max_length=20)
    field11 = models.CharField(max_length=20)
    field12 = models.CharField(max_length=20)
    field13 = models.CharField(max_length=20)

test.py:
from test_.models import *
""""""
Uncomment for first run to create objects...
t2 = Test1(pk=1)
t2.save()
for i in range(0, 1000):
    t = Test2(pk=i, field1='value', field2=t2)
    t.save()
for i in range(0, 1000):
    t = Test1(pk=i)
    t.save()
""""""
from datetime import datetime
from django.conf import settings
# dummy read of settings to avoid weird results in timing:
# first read of settings changes timezone...
t = settings.INSTALLED_APPS

def fetch_objs():
    for i in range(0, 10):
#        for obj in Test1.objects.all():
        for obj in Test2.objects.all():
            pass

import hotshot, hotshot.stats
prof = hotshot.Profile(""test.prof"")
prof.runcall(fetch_objs)
prof.close()
stats = hotshot.stats.load(""test.prof"")
# stats.strip_dirs()
stats.sort_stats('time', 'calls')
stats.print_stats(50)
start = datetime.now()
fetch_objs()
print '%s' % (datetime.now() - start)
# What is the absolute maximum that can be achieved?
from django.db import connection
cursor = connection.cursor()
start = datetime.now()
for i in range(0, 10):
    cursor.execute('select * from test__test2')
    for obj in cursor.fetchall():
        pass
print '%s' % (datetime.now() - start)
}}}

The results on my computer are as follows:

When fetching 10000 test1 objects:
0.085 seconds with patch, 0.145 seconds without patch

When fetching 10000 test2 objects:
0.200 seconds with patch, 0.27 seconds without patch

So, this should result in 20-40% speed up for these simple cases.

The absolute maximum that can be achieved is somewhere around 0.015 seconds for the Test1 case (0.007 for fetching from DB, and 0.07 for creating a python object and setting attributes for it). Add in signals and ModelState creation, and you land in somewhere between 0.02-0.03. So, there is still some ground for optimizations, but going further doesn't seem too easy. Possible optimizations: pass to base.py/BaseModel.`__init__` a dict containing attr_name: val, so that one can update the model `__dict__` directly. This results in around 20% speedup, but is backwards incompatible (either init *args or **kwargs need to contain that dict and existing code does not expect that). and for that reason not included here. Another possibility is to include a different method (qs.as_list()) to fetch the list without any caching (just fetch all the results from cursor and create a list from that). I think that would result in around 20% more speedup, but would require maintaining two different implementations for fetching objects.

Just as a datapoint: Doing the same using Test2.object.raw(""select * from test`__`test2"") results in about 1 second run time. I am going to look into that next, as that is _really_ bad.
"		closed	Database layer (models, ORM)	dev		fixed	performance, queryset, iterator		Unreviewed	1	0	0	0	0	0