Opened 14 years ago

Closed 13 years ago

#14220 closed Cleanup/optimization (wontfix)

Backwards-compat code in db.fields.subclassing is a bottleneck

Reported by: Ole Laursen Owned by: nobody
Component: Database layer (models, ORM) Version: 1.2
Severity: Normal Keywords:
Cc: graham@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Have a project where we need to import some data from mobile devices from time to time, up to about 20.000 rows. With the regular .create()/save() methods, this takes more than 10 minutes, even after fiddling with transactions. So I did a quick hack to combine the model fields with raw SQL and executemany() (code below).

This was about 20 times faster, so I was now down to about a minute in total. I ran the result through Hotspot (Python profiler) and to my surprise, a large fraction of the time was spent inside inner() in django.db.models.fields.subclassing. The culprit appears to be call_with_connection and call_with_connection_and_prepared. Inserting a "return func" in the top of both speeds up the insertion by at least 25% (much of the remaining time is spent in the DB).

I realize backwards compatibility code is necessary, but I guess it's possible to fix this somehow still?

Here's the code that's exercising it (more specifically the line with get_db_prep-save, the function takes a list of not-yet-inserted-in-db Django objects and inserts them):

def insert_many(objects, using="default"):
    if not objects:
        return

    import django.db.models
    from django.db import connections
    con = connections[using]
    
    model = objects[0].__class__
    fields = [f for f in model._meta.fields if not isinstance(f, django.db.models.AutoField)]
    parameters = []
    for o in objects:
        parameters.append(tuple(f.get_db_prep_save(f.pre_save(o, True), connection=con) for f in fields))

    table = model._meta.db_table
    column_names = ",".join(f.column for f in fields)
    placeholders = ",".join(("%s",) * len(fields))
    con.cursor().executemany(
        "insert into %s (%s) values (%s)" % (table, column_names, placeholders),
        parameters)

Change History (4)

comment:1 by Ole Laursen, 14 years ago

Component: UncategorizedDatabase layer (models, ORM)

comment:2 by Ole Laursen, 14 years ago

I realize this is a pretty weak bug report. There are many factors involved in timing this, the profiler output is not super easy to read and the test case I'm using is with real data so I can't easily post it here. I found another problem which seems to stem from an auto_now field which causes datetime.now() to be evaluated for each and every row, but this is something I can fix from the outside by setting the time manually.

I don't know, maybe it's better to close the report until I or somebody else come up with a test case with some easy-to-reproduce evidence.

comment:3 by Russell Keith-Magee, 13 years ago

Triage Stage: UnreviewedAccepted

Performance improvements are always welcome; if anyone has any bright ideas on how to speed this up, I'm all ears.

However, keep in mind that this particular performance bottleneck will be removed in 1.4 -- which means the code in question will be deleted in about 3 months time.

comment:4 by Graham King, 13 years ago

Cc: graham@… added
Resolution: wontfix
Severity: Normal
Status: newclosed
Type: Cleanup/optimization

russelm mentioned this won't be needed after 1.4, and any patch here probably wouldn't be released before that, so am closing. Please re-open if the comment about 1.4 no longer applies.

Note: See TracTickets for help on using tickets.
Back to Top