Opened 17 years ago

Closed 17 years ago

#5203 closed (wontfix)

Performance improvement for sqlite3 database flush

Reported by: jdetaeye@… Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Keywords: sqlite performance
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The attached patch switches the synchronous pragma of the sqlite engine to 'off' during a database flush.

In my application I see a performance improvment with a factor 3 with this change.
Similar speed improvements are reported http://www.rkblog.rk.edu.pl/w/p/sqlite-performance-and-django/

Attachments (2)

sqlite_synchronous.patch (652 bytes ) - added by jdetaeye@… 17 years ago.
patch
sqlite_synchronous_revised.patch (826 bytes ) - added by jdetaeye 17 years ago.
patch, updated

Download all attachments as: .zip

Change History (14)

by jdetaeye@…, 17 years ago

Attachment: sqlite_synchronous.patch added

patch

comment:1 by Simon G. <dev@…>, 17 years ago

Triage Stage: UnreviewedDesign decision needed

When you say "changing the PRAGMA settings may corrupt the database" does this mean "may corrupt the entire database file" or "may corrupt the information in the table, but that's ok since we're deleting it anyway"?

comment:2 by Malcolm Tredinnick, 17 years ago

Simon: having synchronous "off" means that the disk's idea of what the database file contains and your program's idea are not necessarily identical for a short period, because the program has run on ahead. So a machine crash or disk failure in that period can lead to inconsistencies (e.g. your app has already told the user their message was saved when, actually, ... not so much).

For flushing the database, the risk doesn't look that bad to me. Although perhaps the original poster can explain why this is a particularly relevant optimisation. How often do you flush the database? Tests use :memory:, so it's not relevant there. Where does this speed-up have an effect?

Also, since it's not guaranteed that the database connection is closed after the flush, we should be restoring synchronous write behaviour at the end of that method, right? Otherwise it could affect other queries.

comment:3 by Malcolm Tredinnick, 17 years ago

And my last comment doesn't answer Simon's much more specific question at all. Sorry about that.

/me goes off to learn how to read.... :-(

comment:4 by jdetaeye@…, 17 years ago

When you say "changing the PRAGMA settings may corrupt the database" does this mean "may corrupt the
entire database file" or "may corrupt the information in the table, but that's ok since we're
deleting it anyway"?

The whole database file can be corrupted, I believe.

For flushing the database, the risk doesn't look that bad to me.

Right. Especially since the flushing will be done by a system administrator and when normal usage of the application is suspended.

Although perhaps the original poster can explain why this is a particularly relevant optimisation.

In absolute terms the savings are probably small.

For "real" database engines other than sqlite the tuning of the database is to be done outside of the django application by a database administrator.
When using django with sqlite the django app is much more directly responsible for the database. And one might expect django to capture the best practices in this area...

My 2 cents: I thought a ticket was relevant since it does capture such a good and common practice.

comment:5 by jdetaeye@…, 17 years ago

The patch has become invalid with the latest commit.
If you judge the change is relevant, I can roll an updated one...

comment:6 by Simon G. <dev@…>, 17 years ago

Triage Stage: Design decision neededAccepted

Yes please!

by jdetaeye, 17 years ago

patch, updated

comment:7 by jdetaeye, 17 years ago

Reworked patch is now attached...

comment:8 by jdetaeye, 17 years ago

Triage Stage: AcceptedReady for checkin

comment:9 by Russell Keith-Magee, 17 years ago

Triage Stage: Ready for checkinAccepted

Can anyone else confirm these speed increases? I just tried applying the patch and running the Django system tests (which does plenty of flushing). I saw no improvement - certainly not a 3x speedup. Is there something else that needs to be done to make this work?

comment:10 by Fredrik Lundh <fredrik@…>, 17 years ago

I actually got a big slowdown when running the system tests, but that's hopefully not caused by the patch... But doesn't the tests use the :memory: database? (See mtredinnick's comment above).

comment:11 by jdetaeye@…, 17 years ago

I just tried applying the patch and running the Django system tests (which does plenty of flushing). I saw no improvement - certainly not a 3x speedup.

I expect the big improvements when flushing bigger/'production-size' databases: the database file will have plenty of database 'pages' and waiting for disk io will be the clear bottleneck. In smaller databases other bottlenecks can more easily pop up.

To prove the speedup, some for-loop can be used to populate the database with a sufficient number of records. Flushing that database with and without the patch will demonstrate the improvement...

comment:12 by Jacob, 17 years ago

Resolution: wontfix
Status: newclosed

After reading sqlite's documentation, it's pretty clear that setting synchronous off could cause complete database corruption and loss. So while this change might be appropriate for the test system if we weren't using :memory:, Django shouldn't be doing something this dangerous.

Note: See TracTickets for help on using tickets.
Back to Top