Opened 3 years ago

Closed 5 months ago

Last modified 5 months ago

#20461 closed Cleanup/optimization (fixed)

Support for running Django tests in parallel

Reported by: senko Owned by: aaugustin
Component: Testing framework Version: master
Severity: Normal Keywords: 1.9
Cc: cmawebsite@…, github.com@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Running the entire Django test suite is slow, causing people to skip running the entire test suite until just before they push/send a patch.

In theory, at least if one's using in-memory SQLite database, it should be possible to parallelize the tests and make the run faster. There are possibly other potential ways the tests can clobber each other (eg. allocating the same ports for LiveServerTestCases, using the same memcached key prefixes, etc...), but none seem insurmountable at a first thought.

Attachments (1)

paralleltests.py (3.1 KB) - added by senko 3 years ago.
Run django tests in parallel

Download all attachments as: .zip

Change History (30)

comment:1 Changed 3 years ago by senko

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement set
  • Status changed from new to assigned

I've got a proof-of-concept runner that splits the tests into N groups and runs the each group independently. In completely unscientific tests on my (dual core) laptop, I got ~3x speedups.

The runner is here: https://github.com/senko/django/commit/fa6d7a5845ae7863e3bff0c571588a67b19419f0

It's not in a mergeable state yet, as it doesn't address any potential http port allocation or memcached prefix allocation problems. I also couldn't find a reliable way to mimic runtest's behaviour in getting the tests to run, so I think I actually run more tests (by default, if no labels are set) than runtests does by default.

If this entire excercise makes sense, at some point the functionality would probalby need to get moved to runtests itself (which would solve the "can't get exactly the test labels we need" problem).

comment:2 Changed 3 years ago by akaariai

Seems like a good idea to me (even if complete implementation might be hard). If we want to do this for other databases than in-memory sqlite we will need separate databases for each parallel process.

comment:3 Changed 3 years ago by akaariai

  • Triage Stage changed from Unreviewed to Accepted

comment:4 Changed 3 years ago by akaariai

I discussed this on #django-dev IRC channel, and it seems we do not want this into django repo. However, it would be excellent if you could write a standalone script (usable from $HOME/bin/ for example) by which you could run the in-memory tests in parallel. The script doesn't need to do more than what the current script does.

Having a fast way to run all tests, even if the output is ugly and it is usable only for in-memory sqlite would be a valuable addition.

comment:5 Changed 3 years ago by anonymous

Ahh, too bad, I just got the integration into runtests.py working (with extra --workers=N param), avoiding the test label discovery problem I had earlier. Okay, I'll adapt it into the standalone script - actually that way it's easier to specify multiple settings files (needed if you use database other than the in-memory sqlite) without breaking the runtests usage.

I'll attached the updated script here.

Changed 3 years ago by senko

Run django tests in parallel

comment:6 Changed 3 years ago by senko

I've attached the script. It can be located anywhere as long as either the current working directory is django/tests or that it's passed via the --testdir argument.

Script usage:

~/bin/paralleltests.py --testdir=/path/to/django/tests --runners=<N> --settings=test_sqlite [test_labels ...]

All arguments except --testdir=<dir> and --runners=<N> are passed as-is to runtests.py. A good heuristic for the optimal number of runners is 2 * cpu cores you have (from the assumption that some of the tests are actually using the CPU and some are waiting for I/O at any given time - in any case, produces good results on my laptop :)

The script discovers all the tests that you want to run (or uses the test labels provided manually), splits them into N chunks, and starts N parallel runtests processes, one for each chunk of the labels. So it will not speed up a single test label execution (these usually don't last very long, anyways) Also, the test discovery in it sucks, as I try to mimic runtests behaviour but don't impleent the actual discovery logic (and can't reuse it easily from runtests), so it actually runs *more* tests than the default runtests (I've no idea why).

In case it matters to anyone: the script is licensed under the same terms as Django itself.

Is there a good place to put the script so it's more visible/useful to people, without them having to sift through Trac?

Also: I'm going to continue poking at runtests and trying to make it work in parallel for non-sqlite databases as well, to satisfy my own curiosity. I'd appreciate if this ticket could stay open for a while so I have a place to report my findings (if any).

comment:7 Changed 3 years ago by senko

I've updated my branch experimenting with runtests for parallel run. The modified runtests can run sqlite, mysql and postgres tests in parallel with no problems (these are all that I've tested). The only special thing needed is to make sure the test database name is different in each worker (which is easily done by using a randomized name in your test settings file).

The major downside remaining that I can see is that the test output is not nice (interspersed from N parallel workers), and could be unreadable if you have a lot of errors. In the usual case where you want to quickly run all the tests ("I'm done with this bit, let's check nothing else got broken"), at least for me it's not so much of a problem.

Code (still) at: https://github.com/senko/django/tree/ticket_20461

comment:8 Changed 12 months ago by aaugustin

That last link is a 404 these days. senko, do you have the latest code around?

comment:9 Changed 12 months ago by aaugustin

While Django is thread-safe in normal use, it isn't during tests because override_settings has process-wide effects.

That's why parallelizing tests requires processes, not threads. (Knowing this would have saved me some time.)

comment:10 Changed 12 months ago by aaugustin

  • Owner changed from senko to aaugustin

comment:11 Changed 12 months ago by collinanderson

  • Cc cmawebsite@… added

comment:12 Changed 12 months ago by aaugustin

  • Has patch set
  • Patch needs improvement unset

comment:13 Changed 11 months ago by timgraham

  • Patch needs improvement set

Still a work in progress as far as I can tell.

comment:14 Changed 6 months ago by aaugustin

  • Patch needs improvement unset

comment:15 Changed 5 months ago by timgraham

  • Patch needs improvement set

comment:16 Changed 5 months ago by timgraham

  • Keywords 1.9 added

comment:17 Changed 5 months ago by timgraham

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

comment:18 Changed 5 months ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In b1a29541:

Merge pull request #4761 from aaugustin/parallelize-tests-attempt-1

Fixed #20461 -- Allowed running tests in parallel.

comment:19 Changed 5 months ago by bak1an

  • Resolution fixed deleted
  • Status changed from closed to new

b1a2954 fails for me with python 2.7 and 3.4 under fedora 22 x64 using default sqlite configuration.

Stacktraces are there - https://gist.github.com/bak1an/434d884425f3354896b2

There are no old *.pyc files, git status is clean.

Tests finish successfully on previous revision (acb8330).

Am I doing something wrong or something is broken?

comment:20 Changed 5 months ago by bak1an

  • Triage Stage changed from Ready for checkin to Accepted

comment:21 Changed 5 months ago by bak1an

Found it.

Looks like tblib is not optional here.

pip install tblib helped.

Ran 10287 tests in 137.824s

OK (skipped=735, expected failures=7)

Perhaps this should be mentioned in docs and added to the requirements file.

comment:22 Changed 5 months ago by collinanderson

fixing the "no attribute intent" bug: https://github.com/django/django/pull/5257

Last edited 5 months ago by collinanderson (previous) (diff)

comment:23 Changed 5 months ago by collinanderson

adding tblib to requirements: https://github.com/django/django/pull/5258

comment:24 Changed 5 months ago by bak1an

perhaps someone from core team should close this ticket now.

comment:25 Changed 5 months ago by Tim Graham <timograham@…>

In c97b755a:

Refs #20461 -- Fixed parallel test runner on Python 2.7.

textwrap.indent() is new in Python 3.3.

comment:26 Changed 5 months ago by timgraham

  • Resolution set to fixed
  • Status changed from new to closed

comment:27 Changed 5 months ago by Aymeric Augustin <aymeric.augustin@…>

In 968b02f:

Refs #20461 -- Made tblib optional for a passing test run.

This was the original intent.

comment:28 Changed 5 months ago by frankoid

Is the naming strategy for extra test databases documented anywhere? I tried to find this info in the docs but couldn't.

Based on get_test_db_clone_settings(self, number) in the code it looks like the extra databases use the test database name as defined in the DATABASES setting with _<number> appended.

It might also be worth documenting how to set up permissions when using MySQL. When I use Django with PostgreSQL then I grant the PostgreSQL user used by Django permission to create databases (with any name) so I won't need to change anything for parallel testing to work. However when I use MySQL then I usually grant the MySQL user used by Django permission on specific databases, i.e. mydjangodb and mydjangodb_test, so I think I'll need to grant more permissions for parallel testing to work. I'm not aware of a way to grant permission on mydjangodb_test_* without granting permission on all databases (including existing ones), but I haven't researched this in depth.

comment:29 Changed 5 months ago by frankoid

  • Cc github.com@… added
Note: See TracTickets for help on using tickets.
Back to Top