Opened 6 weeks ago
Last modified 6 days ago
#35904 closed New feature
Speed up fixture loading by adding options bulk insert/create — at Version 8
Reported by: | JorisBenschop | Owned by: | |
---|---|---|---|
Component: | Testing framework | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
As per this forum discussion, I have created a patch to improve load times for the loaddata command under some circumstances.
Currently the “loaddata” management command uses the obj.save() method for each deserialized object within a fixture. This function first tries an UPDATE statement and, if that fails, tries an INSERT statement. By using the --force_insert a reduction of 50% of queries is achieved.
A second option is to use bulk_create for insertion of multiple records. This improves insertion speed by (n-1/n), or ~99% for insertion of 100 records.
These options are not meant to cover each use case, and therefore are set to optional.
Benchmark results
===============
test to insert 1000 records from a single fixture (using the Article model on Sqlite)
current: 0.116s
with --force_insert: 0.066s
with --bulk_create: 0.010s
test to insert 10000 records from a single fixture
current: 1.07s
with --force_insert: 0.39s
with --bulk_create: 0.104s
I expect larger models to have a more significant improvement even.
Change History (8)
comment:1 by , 6 weeks ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:2 by , 2 weeks ago
Summary: | Speed up fixture loading by bulk insert → Speed up fixture loading by adding options bulk insert/create |
---|---|
Type: | Uncategorized → New feature |
#35975 was a duplicate
Forum discussion: https://forum.djangoproject.com/t/feature-proposal-faster-fixture-loading-via-loaddata-command/36972
PR: https://github.com/django/django/pull/18889
comment:3 by , 2 weeks ago
Description: | modified (diff) |
---|---|
Has patch: | set |
Resolution: | wontfix |
Status: | closed → new |
comment:4 by , 2 weeks ago
Description: | modified (diff) |
---|
comment:5 by , 2 weeks ago
Description: | modified (diff) |
---|
comment:6 by , 2 weeks ago
As requested by Simon, I have re-opened the ticket and specified the expected improvements in a more exact manner. Steps to reproduce are covered in the tests that are in the PR. I am open to add code to the serde testing, if there is interest.
comment:7 by , 2 weeks ago
Description: | modified (diff) |
---|
comment:8 by , 2 weeks ago
Description: | modified (diff) |
---|
Hello Joris,
This sounds interesting particularly given features like test case serialized rollbacks (which are quite slow) are based on top of model serialization. It would have to be a distinct option as
bulk_create
doesn't fire signals which some setup might require.Just like any new feature requests though they should be discussed on the forum to reach a consensus before being accepted. Given this is a performance related new feature I suggest your proposal come equipped with some details about what kind of improvements users should expect (profiles, benchmarks instead of solely claiming it's fairly inefficient) backed by step to reproduce as well as a PoC that properly deals with other features of serde framework such as natural keys and a plan on how to deal with backends that don't support
ignore_conflicts
. It might even be a good opportunity to augment our performance tracking system with serde benchmarks.It that's the case then sharing this code as a standalone package (e.g.
django-fast-loaddata
) might be a good way to get traction on the above.Assuming there is interest in moving forward we can then re-open this issue.