Code

Opened 4 years ago

Closed 3 years ago

#14220 closed Cleanup/optimization (wontfix)

Backwards-compat code in db.fields.subclassing is a bottleneck

Reported by: olau Owned by: nobody
Component: Database layer (models, ORM) Version: 1.2
Severity: Normal Keywords:
Cc: graham@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Have a project where we need to import some data from mobile devices from time to time, up to about 20.000 rows. With the regular .create()/save() methods, this takes more than 10 minutes, even after fiddling with transactions. So I did a quick hack to combine the model fields with raw SQL and executemany() (code below).

This was about 20 times faster, so I was now down to about a minute in total. I ran the result through Hotspot (Python profiler) and to my surprise, a large fraction of the time was spent inside inner() in django.db.models.fields.subclassing. The culprit appears to be call_with_connection and call_with_connection_and_prepared. Inserting a "return func" in the top of both speeds up the insertion by at least 25% (much of the remaining time is spent in the DB).

I realize backwards compatibility code is necessary, but I guess it's possible to fix this somehow still?

Here's the code that's exercising it (more specifically the line with get_db_prep-save, the function takes a list of not-yet-inserted-in-db Django objects and inserts them):

def insert_many(objects, using="default"):
    if not objects:
        return

    import django.db.models
    from django.db import connections
    con = connections[using]
    
    model = objects[0].__class__
    fields = [f for f in model._meta.fields if not isinstance(f, django.db.models.AutoField)]
    parameters = []
    for o in objects:
        parameters.append(tuple(f.get_db_prep_save(f.pre_save(o, True), connection=con) for f in fields))

    table = model._meta.db_table
    column_names = ",".join(f.column for f in fields)
    placeholders = ",".join(("%s",) * len(fields))
    con.cursor().executemany(
        "insert into %s (%s) values (%s)" % (table, column_names, placeholders),
        parameters)

Attachments (0)

Change History (4)

comment:1 Changed 4 years ago by olau

  • Component changed from Uncategorized to Database layer (models, ORM)
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 4 years ago by olau

I realize this is a pretty weak bug report. There are many factors involved in timing this, the profiler output is not super easy to read and the test case I'm using is with real data so I can't easily post it here. I found another problem which seems to stem from an auto_now field which causes datetime.now() to be evaluated for each and every row, but this is something I can fix from the outside by setting the time manually.

I don't know, maybe it's better to close the report until I or somebody else come up with a test case with some easy-to-reproduce evidence.

comment:3 Changed 4 years ago by russellm

  • Triage Stage changed from Unreviewed to Accepted

Performance improvements are always welcome; if anyone has any bright ideas on how to speed this up, I'm all ears.

However, keep in mind that this particular performance bottleneck will be removed in 1.4 -- which means the code in question will be deleted in about 3 months time.

comment:4 Changed 3 years ago by graham_king

  • Cc graham@… added
  • Resolution set to wontfix
  • Severity set to Normal
  • Status changed from new to closed
  • Type set to Cleanup/optimization

russelm mentioned this won't be needed after 1.4, and any patch here probably wouldn't be released before that, so am closing. Please re-open if the comment about 1.4 no longer applies.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.