Backwards-compat code in db.fields.subclassing is a bottleneck
|Reported by:||Ole Laursen||Owned by:||nobody|
|Component:||Database layer (models, ORM)||Version:||1.2|
|Has patch:||no||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
Have a project where we need to import some data from mobile devices from time to time, up to about 20.000 rows. With the regular .create()/save() methods, this takes more than 10 minutes, even after fiddling with transactions. So I did a quick hack to combine the model fields with raw SQL and executemany() (code below).
This was about 20 times faster, so I was now down to about a minute in total. I ran the result through Hotspot (Python profiler) and to my surprise, a large fraction of the time was spent inside inner() in django.db.models.fields.subclassing. The culprit appears to be call_with_connection and call_with_connection_and_prepared. Inserting a "return func" in the top of both speeds up the insertion by at least 25% (much of the remaining time is spent in the DB).
I realize backwards compatibility code is necessary, but I guess it's possible to fix this somehow still?
Here's the code that's exercising it (more specifically the line with get_db_prep-save, the function takes a list of not-yet-inserted-in-db Django objects and inserts them):
def insert_many(objects, using="default"): if not objects: return import django.db.models from django.db import connections con = connections[using] model = objects.__class__ fields = [f for f in model._meta.fields if not isinstance(f, django.db.models.AutoField)] parameters =  for o in objects: parameters.append(tuple(f.get_db_prep_save(f.pre_save(o, True), connection=con) for f in fields)) table = model._meta.db_table column_names = ",".join(f.column for f in fields) placeholders = ",".join(("%s",) * len(fields)) con.cursor().executemany( "insert into %s (%s) values (%s)" % (table, column_names, placeholders), parameters)