Opened 14 years ago
Closed 14 years ago
#14220 closed Cleanup/optimization (wontfix)
Backwards-compat code in db.fields.subclassing is a bottleneck
Reported by: | Ole Laursen | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.2 |
Severity: | Normal | Keywords: | |
Cc: | graham@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Have a project where we need to import some data from mobile devices from time to time, up to about 20.000 rows. With the regular .create()/save() methods, this takes more than 10 minutes, even after fiddling with transactions. So I did a quick hack to combine the model fields with raw SQL and executemany() (code below).
This was about 20 times faster, so I was now down to about a minute in total. I ran the result through Hotspot (Python profiler) and to my surprise, a large fraction of the time was spent inside inner() in django.db.models.fields.subclassing. The culprit appears to be call_with_connection and call_with_connection_and_prepared. Inserting a "return func" in the top of both speeds up the insertion by at least 25% (much of the remaining time is spent in the DB).
I realize backwards compatibility code is necessary, but I guess it's possible to fix this somehow still?
Here's the code that's exercising it (more specifically the line with get_db_prep-save, the function takes a list of not-yet-inserted-in-db Django objects and inserts them):
def insert_many(objects, using="default"): if not objects: return import django.db.models from django.db import connections con = connections[using] model = objects[0].__class__ fields = [f for f in model._meta.fields if not isinstance(f, django.db.models.AutoField)] parameters = [] for o in objects: parameters.append(tuple(f.get_db_prep_save(f.pre_save(o, True), connection=con) for f in fields)) table = model._meta.db_table column_names = ",".join(f.column for f in fields) placeholders = ",".join(("%s",) * len(fields)) con.cursor().executemany( "insert into %s (%s) values (%s)" % (table, column_names, placeholders), parameters)
Change History (4)
comment:1 by , 14 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|
comment:2 by , 14 years ago
comment:3 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Performance improvements are always welcome; if anyone has any bright ideas on how to speed this up, I'm all ears.
However, keep in mind that this particular performance bottleneck will be removed in 1.4 -- which means the code in question will be deleted in about 3 months time.
comment:4 by , 14 years ago
Cc: | added |
---|---|
Resolution: | → wontfix |
Severity: | → Normal |
Status: | new → closed |
Type: | → Cleanup/optimization |
russelm mentioned this won't be needed after 1.4, and any patch here probably wouldn't be released before that, so am closing. Please re-open if the comment about 1.4 no longer applies.
I realize this is a pretty weak bug report. There are many factors involved in timing this, the profiler output is not super easy to read and the test case I'm using is with real data so I can't easily post it here. I found another problem which seems to stem from an auto_now field which causes datetime.now() to be evaluated for each and every row, but this is something I can fix from the outside by setting the time manually.
I don't know, maybe it's better to close the report until I or somebody else come up with a test case with some easy-to-reproduce evidence.