Opened 8 years ago

Closed 8 years ago

#5203 closed (wontfix)

Performance improvement for sqlite3 database flush

Reported by: jdetaeye@… Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Keywords: sqlite performance
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

The attached patch switches the synchronous pragma of the sqlite engine to 'off' during a database flush.

In my application I see a performance improvment with a factor 3 with this change.
Similar speed improvements are reported http://www.rkblog.rk.edu.pl/w/p/sqlite-performance-and-django/

Attachments (2)

sqlite_synchronous.patch (652 bytes) - added by jdetaeye@… 8 years ago.
patch
sqlite_synchronous_revised.patch (826 bytes) - added by jdetaeye 8 years ago.
patch, updated

Download all attachments as: .zip

Change History (14)

Changed 8 years ago by jdetaeye@…

patch

comment:1 Changed 8 years ago by Simon G. <dev@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

When you say "changing the PRAGMA settings may corrupt the database" does this mean "may corrupt the entire database file" or "may corrupt the information in the table, but that's ok since we're deleting it anyway"?

comment:2 Changed 8 years ago by mtredinnick

Simon: having synchronous "off" means that the disk's idea of what the database file contains and your program's idea are not necessarily identical for a short period, because the program has run on ahead. So a machine crash or disk failure in that period can lead to inconsistencies (e.g. your app has already told the user their message was saved when, actually, ... not so much).

For flushing the database, the risk doesn't look that bad to me. Although perhaps the original poster can explain why this is a particularly relevant optimisation. How often do you flush the database? Tests use :memory:, so it's not relevant there. Where does this speed-up have an effect?

Also, since it's not guaranteed that the database connection is closed after the flush, we should be restoring synchronous write behaviour at the end of that method, right? Otherwise it could affect other queries.

comment:3 Changed 8 years ago by mtredinnick

And my last comment doesn't answer Simon's much more specific question at all. Sorry about that.

/me goes off to learn how to read.... :-(

comment:4 Changed 8 years ago by jdetaeye@…

When you say "changing the PRAGMA settings may corrupt the database" does this mean "may corrupt the
entire database file" or "may corrupt the information in the table, but that's ok since we're
deleting it anyway"?

The whole database file can be corrupted, I believe.

For flushing the database, the risk doesn't look that bad to me.

Right. Especially since the flushing will be done by a system administrator and when normal usage of the application is suspended.

Although perhaps the original poster can explain why this is a particularly relevant optimisation.

In absolute terms the savings are probably small.

For "real" database engines other than sqlite the tuning of the database is to be done outside of the django application by a database administrator.
When using django with sqlite the django app is much more directly responsible for the database. And one might expect django to capture the best practices in this area...

My 2 cents: I thought a ticket was relevant since it does capture such a good and common practice.

comment:5 Changed 8 years ago by jdetaeye@…

The patch has become invalid with the latest commit.
If you judge the change is relevant, I can roll an updated one...

comment:6 Changed 8 years ago by Simon G. <dev@…>

  • Triage Stage changed from Design decision needed to Accepted

Yes please!

Changed 8 years ago by jdetaeye

patch, updated

comment:7 Changed 8 years ago by jdetaeye

Reworked patch is now attached...

comment:8 Changed 8 years ago by jdetaeye

  • Triage Stage changed from Accepted to Ready for checkin

comment:9 Changed 8 years ago by russellm

  • Triage Stage changed from Ready for checkin to Accepted

Can anyone else confirm these speed increases? I just tried applying the patch and running the Django system tests (which does plenty of flushing). I saw no improvement - certainly not a 3x speedup. Is there something else that needs to be done to make this work?

comment:10 Changed 8 years ago by Fredrik Lundh <fredrik@…>

I actually got a big slowdown when running the system tests, but that's hopefully not caused by the patch... But doesn't the tests use the :memory: database? (See mtredinnick's comment above).

comment:11 Changed 8 years ago by jdetaeye@…

I just tried applying the patch and running the Django system tests (which does plenty of flushing). I saw no improvement - certainly not a 3x speedup.

I expect the big improvements when flushing bigger/'production-size' databases: the database file will have plenty of database 'pages' and waiting for disk io will be the clear bottleneck. In smaller databases other bottlenecks can more easily pop up.

To prove the speedup, some for-loop can be used to populate the database with a sufficient number of records. Flushing that database with and without the patch will demonstrate the improvement...

comment:12 Changed 8 years ago by jacob

  • Resolution set to wontfix
  • Status changed from new to closed

After reading sqlite's documentation, it's pretty clear that setting synchronous off could cause complete database corruption and loss. So while this change might be appropriate for the test system if we weren't using :memory:, Django shouldn't be doing something this dangerous.

Note: See TracTickets for help on using tickets.
Back to Top