#8138 closed (fixed)
Switch django tests to use transactions
Reported by: | Marc Remolt | Owned by: | nobody |
---|---|---|---|
Component: | Testing framework | Version: | dev |
Severity: | Keywords: | ||
Cc: | Trevor Caira, dan.fairs@…, remco@…, Colin Copeland | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | UI/UX: |
Description
Currently the Django testsuite flushes the database after every test. That is a major performance drawback. The fixture loading/truncating takes much longer than the actual test. So I propose to switch the Django testtools to use transactions:
- Before every test a transaction is started
- The fixtures (if existing) are loaded
- No commit is performed
- The tests run
- A rollback is performed
There is a matching discussion on django-developers:
http://groups.google.com/group/django-developers/browse_thread/thread/49aa551ad41fb919
My first tests with the concept show some promise. The first try at a patch breaks some tests (1 on SQLite, 25 on PostgreSQL), but the performance improvement is significant. With the django testsuite the results on my Laptop (slow disk - IO is the bottleneck with the tests here) so far are:
- SQLite in-memory-db:
- before: 415 sec
- after: 79 sec
- PostgreSQL:
- before: 4285 sec
- after: 359 sec
As suggested by Russell Keith-Magee, I renamed the original test.TestCase to test.TransactionTestCase and implemented the new functionality in test.TestCase. If someone explicitly wants to use transactions inside a test, he can use TransactionTestCase and will only loose the performance gain.
With the current patch all doctests now use transactional behaviour. We'll see, if I can implement a choice there, too - if it is necessary.
Attachments (11)
Change History (46)
comment:1 by , 16 years ago
milestone: | post-1.0 → 1.0 |
---|---|
Triage Stage: | Unreviewed → Accepted |
follow-up: 3 comment:2 by , 16 years ago
IMO it's kind of counterintuitive to call the tests that don't use transactions TransactionTestCase.
comment:3 by , 16 years ago
Replying to ElliottM:
IMO it's kind of counterintuitive to call the tests that don't use transactions TransactionTestCase.
The reason behind the naming was actually the other way around - you use TransactionTestCase if you need to test transactional behaviour. Normal 'just put it in the database' behaviour is covered by a standard test case.
However, I'm not married to either name - suggestions are welcome.
comment:4 by , 16 years ago
I updated mremolt's patch:
- Removed the public '--no-commit' option to the loaddata command (it is still available as a kwarg 'no_commit', just not documented publically)
- Refactored
_pre_setup()
and_post_teardown()
to avoid the duplication between TransactionTestCase and its subclass, TestCase
by , 16 years ago
Attachment: | django_transaction_tests_2.diff added |
---|
by , 16 years ago
Attachment: | test_in_transactions.diff added |
---|
patch of test framework and fixes for tests
comment:5 by , 16 years ago
One minor addition - SecureViewTest (added in [8271]) also needs to be TransactionTestCase.
Regarding the "#Test uses transactions" modifier in the doctests: there is an interesting crossover here with #5624. #5624 calls for a way to associate fixtures with doctests. It occurs to me that if we're going to start adding directives to doctests (and there are some other candidates beyond these two), we should be consistent with how we specify options like this one.
At the very least, we need to pick a consistent (and extensible) syntax; better still would be to use an existing syntax, like doctest option syntax. Using doctest.register_optionflag() isn't an option because of the need to support Python 2.3.
comment:6 by , 16 years ago
Clarifying for Luke's benefit - I don't know if you were planning to, but don't check this in yet :-)
comment:7 by , 16 years ago
milestone: | 1.0 → post-1.0 |
---|
Ok; Speaking with Malcolm and Jacob, we've decided to defer this to post-1.0. Feature freeze is in 2 days; we won't have time to resolve the minor design issue of doctest flags, Jacob doesn't like the TransactionTestCase name, and there is the risk (however small) that we could introduce an unexpected error into the test system just days before we go into a heavy duty test phase.
The idea is definitely going into v1.1, though.
by , 16 years ago
Attachment: | django_transaction_tests_3.diff added |
---|
by , 16 years ago
Attachment: | django_transaction_tests_3.2.diff added |
---|
comment:8 by , 16 years ago
I've uploaded my current (rough) progress on the ticket (django_transaction_tests_3.2.diff).
At the momrent I only need one TransactionTestCase, as the other problems are resolved by the testsuite automatically and that test really does some nasty things.
- SQLite: 1 failure (something new to do with the filestore refactor)
- Postgres: 1 failure (same as sqlite) and 2 errors (seems something inside session handling)
- MySQL: has problems loading some fixtures, the dreaded "foreign key constraint fails"
Before I continue, could I have some feedback on my way of solving the problems. As you've put it to post 1.0 (good idea), there is no hurry. At the moment it looks like we could get this thing fully backwards compatible, so getting rid of TransactionTestCase completely.
By the way, please ignore django_transaction_tests_3.diff. Wrong patch base directory.
comment:9 by , 16 years ago
Owner: | changed from | to
---|
Russell - I wasn't planning on committing it, and I'm glad you've decided to defer it, it's too big a change to make at this point until we are really clear on the implications and have thrashed them out in some different code bases for a while.
Also, my method of flagging the transactional tests in the doctest string was a hack really, I meant to mark it as such and ask about better ways of doing it.
I'm un-assigning from myself as I'm not intending to do anything on this in the near future.
comment:10 by , 16 years ago
comment:11 by , 16 years ago
Cc: | added |
---|
comment:12 by , 16 years ago
Cc: | added |
---|
comment:13 by , 16 years ago
Component: | Tools → Testing framework |
---|
comment:14 by , 16 years ago
Cc: | added |
---|
comment:15 by , 16 years ago
Cc: | added |
---|
comment:16 by , 16 years ago
From looking at patch: I think it can also be clever and flush database between cases only if they differ in used fixtures - if they are same, there is no reason to.
Not sure how to implement it using standard django test framework (fairly easy with nose or functools, I guess).
comment:17 by , 16 years ago
I updated the diff to reasonably current trunk level (it no longer needs to include the loaddata change since that function was added to loaddata in r8336). I get no failures with that diff and sqlite as the backend.
Couple of others I've tried:
MySQL/InnoDB -- I get a couple of failures in addition to the ones "normally" seen in the test suite for that combo. First, the auth tests that use the login() function defined in the test fail because it seems the subsequent call doesn't see the session changes that were supposed to be made by login(). Switching these tests to use the test client login function makes the failures go away (as does using the file session backend), so that may be one way to fix these, though I'd like to understand it better. Second, some of the comment testcases are written to assume pk's starting at 1 for posted comments. However on InnoDB rolling back a transaction does not roll back the autoincrement sequence numbers (see http://bugs.mysql.com/bug.php?id=6714, closed wontfix), so when using rollback the comment testcases get a failure due to pk's not getting assigned as expected. Not sure how to fix that one.
MySQL/MyISAM -- What's supposed to be happening here? Since transactions aren't supported I think we continue to need to use the old way, but we don't seem to with the current patch (multiple failures/errors I did not look into because unless we're doing something to avoid the rollback use in this case I don't see how it would work). Am I right in thinking we need to somehow avoid trying the rollback approach for this combo?
Postgres, Oracle not tried yet. Running the old code on the machine where I have Postgres (very old tired machine) took 8 hours (!) and Oracle (on a different, but still not exactly peppy) machine in brief tests has seemed slow as molassas so I may not have the horsepower required to track down much on those backends.
follow-ups: 19 21 comment:18 by , 16 years ago
I've looked at this a little more and figured out a few things.
First, it never seemed well-understood from the mailing list discussion why tests that used the test Client had to continue to use the flush/reload and even then there were failures. The problem is that django.test.client.ClientHandler
's __call__
sends the request_finished
signal, which is connected (in django.db.__init__
) to close_connection
, which severs the connection to the database. So if a test POSTed via the test Client in order to login, for example, the session changes made during the POST were rolled back when the POST's request_finished
signal severed the DB connection apparently mid-transaction, and subsequent stuff done by the testcase that relied on being logged in would fail.
Disconnecting close_connection
from from request_finished
solves this problem, so that tests that use the test Client do not automatically need to use the old way, and may take advantage of the rollback approach. In the patch I'll attach the signal is disconnected/reconnected around the sending of the signal...this is probably overkill. I think it would be OK to just disconnect close_connection
from that signal once and be done with it but I haven't had time to go back and experiment with that since I tried the first way.
Second, having the fixture teardown code attempt to detect whether a commit had been performed by the test code (despite the test case being a type that supposedly didn't use transactions) and fallback to flush/reload didn't seem to be working too well. I tried tracking down a few of the failures caused by these and eventually decided to try a different approach, which is: if it's a test case that says it doesn't need transactions to work, monkey patch the transaction commit/rollback and enter/leave transaction management routines so that even if they are called by the code being tested, they do nothing, for the duration of the test.
I don't know how the monkey patch idea will be received, but this approach seems to work better. Doing it this way, none of the existing Django tescases need to use the old flush/reload approach. That is, the tests that do actually call commit or rollback (like admin_views, which call admin views that are wrapped in commit_on_success) don't actually need those calls to do anything in order for the test to properly test what it is trying to test. Things work fine when those routines don't do anything, and the test fixture teardown is able to reliably un-do what the test has done, leaving things clean for the next test.
Which is not to say there were no remaining failures, but what was left was somewhat easier to deal with. First and easiest there were a few tests that assume objects they created would be assigned pk's starting with 1. This is an invalid assumption when rollback is being used to un-do the test effects since autoincrement sequence numbers are not necessarily reset (sqlite seems to reset them, the others don't). These were pretty easy to modify to retrieve and use the pk instead of assuming 1, 2, etc.
Second was a failure in the views test resulting from the Site's SITE_CACHE not being cleared during fixture loading. The flush command sends out the post-syncdb signal, which Sites connects to to initialize a default site and clear the cache, so this SITE_CACHE was always clear on test start using the old approach. With the rollback approach, the views test would pass if it was run by itself, but would fail if run with comment_tests, which apparently does something to cause SITE_CACHE to get an entry for site 1. The value (default since comment_tests doesn't load anything into the Site model) didn't match what was expected by views (which loads a Site via a fixture, with values different from the default). In the patch I'll attach this is fixed by calling the Site's clear_cache function during fixture loading. Not particularly elegant and it points out a subtle difference between the old and new approaches -- but this is the only problem I have found resulting from that difference.
Third and one I haven't entirely figured out is this failure, only with PostgreSQL, in the fixtures test:
====================================================================== FAIL: Doctest: modeltests.fixtures.models.__test__.API_TESTS ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/kmt/tmp/django/trunk/django/test/_doctest.py", line 2180, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for modeltests.fixtures.models.__test__.API_TESTS File "/home/kmt/tmp/django/trunk/tests/modeltests/fixtures/models.py", line unknown line number, in API_TESTS ---------------------------------------------------------------------- File "/home/kmt/tmp/django/trunk/tests/modeltests/fixtures/models.py", line ?, in modeltests.fixtures.models.__test__.API_TESTS Failed example: management.call_command('dumpdata', 'fixtures', format='json') Expected: [{"pk": 3, "model": "fixtures.article", "fields": {"headline": "Time to reform copyright", "pub_date": "2006-06-16 13:00:00"}}, {"pk": 2, "model": "fixtures.article", "fields": {"headline": "Poker has no place on ESPN", "pub_date": "2006-06-16 12:00:00"}}, {"pk": 1, "model": "fixtures.article", "fields": {"headline": "Python program becomes self aware", "pub_date": "2006-06-16 11:00:00"}}] Got: [{"pk": 3, "model": "fixtures.article", "fields": {"headline": "Time to reform copyright", "pub_date": "2006-06-16 14:00:00"}}, {"pk": 2, "model": "fixtures.article", "fields": {"headline": "Poker has no place on ESPN", "pub_date": "2006-06-16 13:00:00"}}, {"pk": 1, "model": "fixtures.article", "fields": {"headline": "Python program becomes self aware", "pub_date": "2006-06-16 12:00:00"}}] ---------------------------------------------------------------------- Ran 2 tests in 25.922s FAILED (failures=1) Destroying test database...
The difference is in the times, they are an hour off. Luke also mentioned seeing this on the mailing list, only his times were off by 6 hours. I'd guess Luke's real time zone was 6 hours off from 'America/Chicago' while I am 1 hour off from 'American/Chicago'. I can make this failure go away by setting the TIME_ZONE setting to match my real time zone, but I haven't figured out why this is necessary only when using the rollback approach -- how did the times workout properly without a TIME_ZONE setting in the flush/reload approach?
I think that was it for problems encountered. With this approach the test suite runs about 6x faster on sqlite, 8-12x faster on MySQL/InnoDB, PostgreSQL, and Oracle, on my (oldish to antique) machines, with no failures other than what I see using the current trunk code (InnoDB's failures are well-known, I think, I also see a PostgreSQL failure related to #9758/#9006 with both old and new test approaches, and my Oracle install on a Windows box has a quartet of failures that may be related to Windows file and charset issues that I haven't bothered to track down since I barely know how to spell Oracle).
That leaves MySQL/MyISAM, which of course fails miserably with the rollback approach since its rollback doesn't actually do anything. I added a routine that attempts to detect whether rollback works in the test DB, and if it does not the fixture setup/teardown falls back to the old way. It occurs to me now that I didn't do anything to deal with the DocTest runner's rollback not working (nor does the DocTest runner monkey patch the transaction routines)...there's probably more somebody needs to understand about the DocTest runner with the new approach, but I've run out of time to look at this for a few days probably so if anybody else has time to look at it go ahead. I also suspect the "does rollback work" routine could be done more elegantly; what I've put in is pretty much a quick hack to try to get tests basically working with the patch on MySQL/MyISAM.
by , 16 years ago
Attachment: | 8138alternate.diff added |
---|
follow-up: 20 comment:19 by , 16 years ago
Replying to kmtracey:
I've looked at this a little more and figured out a few things.
Disconnecting
close_connection
from fromrequest_finished
solves this problem, so that tests that use the test Client do not automatically need to use the old way, and may take advantage of the rollback approach. In the patch I'll attach the signal is disconnected/reconnected around the sending of the signal...this is probably overkill. I think it would be OK to just disconnectclose_connection
from that signal once and be done with it but I haven't had time to go back and experiment with that since I tried the first way.
Great, I always wondered why the test client behaved that way and suspected threading/process problems.
Second, having the fixture teardown code attempt to detect whether a commit had been performed by the test code (despite the test case being a type that supposedly didn't use transactions) and fallback to flush/reload didn't seem to be working too well. I tried tracking down a few of the failures caused by these and eventually decided to try a different approach, which is: if it's a test case that says it doesn't need transactions to work, monkey patch the transaction commit/rollback and enter/leave transaction management routines so that even if they are called by the code being tested, they do nothing, for the duration of the test.
I don't know how the monkey patch idea will be received, but this approach seems to work better. Doing it this way, none of the existing Django tescases need to use the old flush/reload approach. That is, the tests that do actually call commit or rollback (like admin_views, which call admin views that are wrapped in commit_on_success) don't actually need those calls to do anything in order for the test to properly test what it is trying to test. Things work fine when those routines don't do anything, and the test fixture teardown is able to reliably un-do what the test has done, leaving things clean for the next test.
What about adding signals to transaction processing? It could simplify the code here and might be useful at other places. Whenever a transaction is commited, a signal post_commit is emitted. This signal is catched by the test case and if it is a case that shouldn't need transactions, it simply throws an TransactionNotAllowedHere exception "That's evil! Please use the TestCase that can handle transactions.". That way no monkeypatching is necessary and it tells the developer what she/he is doing wrong.
Instead of trying to guess if a transaction happened (was a bad idea) or deactivating transactions transparently (if a test relies on the commit, the developer will have some nice head scratching), it simply enforces the developer to use the right TestCase. What do you think?
By the way, during the holidays (when that damned website is online) I can start contributing to that patch again, if my help is wanted.
comment:20 by , 16 years ago
I actually think it should be relatively easy for developers to know that what they are testing requires working transaction methods. You need, for example, to be testing that in a sequence of n updates where some will succeed but ultimately a failure will be encountered causing the whole set to be rolled back, to be testing to make sure that some of the initial DB updates that were made were actually rolled back by the subsequent failure. This sort of test will not be able to work with the do-nothing transaction routines, it would have to use TransactionTestCase to run successfully, and I think the failure when run under a simple TestCase would be pretty obvious.
As for commit, I think it would be relatively hard to write a test that will notice an expected commit has not been done, since you'd have to somehow do a query from outside the context of the test case transaction to be able to tell that what would ordinarily have been committed by the code under test hasn't actually been committed yet.
I also think it's pretty nice to be able to get the speedup offered by the rollback test approach even for code under test that calls transaction routines, when in fact the tests don't need those routines to be functional in order to correctly test what they are intending to test. The admin views fall into this category. The edit view, for example, is wrapped in commit_on_success, which if running on a transaction-supporting database provides a nice all-or-nothing guarantee for, say, multiple updates to inine-edited objects. But we don't have any tests for that guarantee (they would necessarily fail on MySQL/MyISAM). So as far as the admin_views tests are concerned any commits/rollbacks done by the code under test aren't critical to the proper functioning of the test. It would seem a shame to require these tests to take the performance hit of having to use the flush/load test approach simply because admin provides the all-or-nothing behavior when run on a transaction-supporting DB.
So it seems the more I think about it the more I like the monkey-patching approach, personally. (I am perhaps being unduly influenced by being able to run the full Django test suite on one of my machines now in under 10 minutes where before near an hour was the minimum...and under and hour even on my very slow old test machines where before a day or overnight run was required.)
I'll be around over the holidays but won't have much time free since I'll have family visiting. If you (or anyone) could figure out the PostgreSQL time issue that would be a good thing to fully understand before moving forward with this. Also if there's a more elegant way to test for "the DB rollback works" and what the consequences of it not working are w.r.t. DoctTests. Is what's in the patch I attached last night equivalent to the old approach when run on a non-transaction-supporting DB given that the fixture loading for unit tests will flush/reload, making it no big deal that the rollback after a DocTest run does nothing? The old approach didn't appear to bracket DocTests with any sort of DB flush/reload so maybe thing are OK there, but I didn't pay a whole lot of attention to the DocTest bits as I was looking at this since most of the errors I was hitting had to do with unit tests.
follow-up: 22 comment:21 by , 16 years ago
Replying to kmtracey:
Third and one I haven't entirely figured out is this failure, only with PostgreSQL, in the fixtures test:
...
The difference is in the times, they are an hour off. Luke also mentioned seeing this on the mailing list, only his times were off by 6 hours. I'd guess Luke's real time zone was 6 hours off from 'America/Chicago' while I am 1 hour off from 'American/Chicago'. I can make this failure go away by setting the TIME_ZONE setting to match my real time zone, but I haven't figured out why this is necessary only when using the rollback approach -- how did the times workout properly without a TIME_ZONE setting in the flush/reload approach?
I've been trying to wrap my head about what could be the cause of this. In my case I'm currently four hours ahead of Chicago, but back in June 2006 (the times used in the Article objects contained in the fixtures) we were just two hours ahead and that's the time difference I was seeing in the test error.
I checked that the TIME_ZONE
setting value was efectively being sent with SET TIME ZONE
to the DB in postgresql_psycopg2's ªDatabaseWrapper _cursor
method when the syncdb
and loaddata
managemente commands were being called inthe same doctest.
All this seemed to confirm that for some reason, when dumpdata
was being called the data it was getting from the DB was being converted to the local time zone. Then when I saw the call to set_isolation_level(1)
in _cursor
method it occurred to me that what could be happening is that the rollback operations used in the new approach were undoing even the SET TIME ZONE
clause effects. Adding a commit()
call right below it solved the problem. I don't know if it's a correct fix and if it's the right place where this should be fixed (becaus I don't know if does adversely affect normal usage and the flush/recreate approach) but I hope it is of some help.
diff -r e40efae5e6f9 django/db/backends/postgresql_psycopg2/base.py --- a/django/db/backends/postgresql_psycopg2/base.py Sat Jan 03 15:06:38 2009 -0200 +++ b/django/db/backends/postgresql_psycopg2/base.py Sat Jan 03 23:29:17 2009 -0200 @@ -88,6 +88,7 @@ cursor.tzinfo_factory = None if set_tz: cursor.execute("SET TIME ZONE %s", [settings.TIME_ZONE]) + self.connection.commit() if not hasattr(self, '_version'): self.__class__._version = get_version(cursor) if self._version < (8, 0):
comment:22 by , 16 years ago
Replying to ramiro:
... what could be happening is that the rollback operations used in the new approach were undoing even the
SET TIME ZONE
clause effects. Adding acommit()
call right below it solved the problem. I don't know if it's a correct fix and if it's the right place where this should be fixed (because I don't know if does adversely affect normal usage and the flush/recreate approach) but I hope it is of some help.
#9964 seems to confirm this.
comment:23 by , 16 years ago
OK, with Ramiro's pointers as a guide, I've figured out what, exactly, causes the time zone mismatch reported by modeltests/fixtures with the current patch. (Attributing it to the rollback usage didn't quite answer the question for me completely since I couldn't see why just the time zone setting and not the actual data addition was being rolled back using the new approach.) What's happening is the result of bracketing the doctest run by the transaction stuff:
def run(self, test, compileflags=None, out=None, clear_globs=True): """ Wraps the parent run() and encloses it in a transaction. """ transaction.enter_transaction_management() transaction.managed(True) result = doctest.DocTestRunner.run(self, test, compileflags, out, clear_globs) transaction.rollback() transaction.leave_transaction_management() return result
The sequence of tests that is failing is first:
# Load fixture 1 again, using format discovery >>> management.call_command('loaddata', 'fixture1', verbosity=0)
That loaddata uses a connection that has had its timezone set to whatever is in settings. The loaddata works, the data is committed to the DB, and that connection is closed by this code at the very end of loaddata.py:
# Close the DB connection. This is required as a workaround for an # edge case in MySQL: if the same connection is used to # create tables, load data, and query, the query can return # incorrect results. See Django #7572, MySQL #37735. if commit: connection.close()
Next thing in the test is this:
>>> Article.objects.all() [<Article: Time to reform copyright>, <Article: Poker has no place on ESPN>, <Article: Python program becomes self aware>]
which opens a new connection, sets the timezone, and retrieves these results from the DB. No commit yet, so the timezone setting is vulnerable to rollback.
Next in the test is:
# Try to load fixture 2 using format discovery; this will fail # because there are two fixture2's in the fixtures directory >>> management.call_command('loaddata', 'fixture2', verbosity=0) # doctest: +ELLIPSIS Multiple fixtures named 'fixture2' in '...fixtures'. Aborting.
This loaddata is going to fail, and it is what causes the timezone setting to be rolled back, since loaddata calls rollback in the case of failure. The dumpdata is then called subsequently on the same connection, where the timezone has reverted to whatever matches the actual OS timezone. If this is different from what is in settings (or the default of America/Chicago if settings is silent on the matter of timezone) then the times in the dumpdata results will be off.
But still it wasn't clear why this rollback by loaddata was only a problem with the patch code. The answer to that is this code near the beginning of loaddata:
# Start transaction management. All fixtures are installed in a # single transaction to ensure that all references are resolved. if commit: transaction.commit_unless_managed() transaction.enter_transaction_management() transaction.managed(True)
So on entry to loaddata, the uncommitted set timezone will be committed before the failure-induced rollback if the doctest itself has not been bracketed by the enter_transaction_management
, etc.
Not sure what the right fix is. A brief experiment with sqlite seems to indicate that the patch's changes to bracket DocTestRunner.run
in a transaction that is rolled back are not actually necessary for successfully running the Django test suite. That is I can remove that whole run
override and all Django tests pass. (This is also consistent with not seeing any problems caused by this newly added rollback not working on MySQL/MyISAM.) So apparently anything done to the DB by doctests don't interfere with subsequent tests in the Django testsuite. But I'm a bit out of my depth here in knowing if that would be an OK approach -- it seems inconsistent for one type of test to use rolled-back transactions and the other to not.
Alternatively, always immediately committing the timezone setting seems harmless, since it is only done when a new connection is opened, so it's not like anything other than the timezone setting will be accidentally committed, right? And that does solve the problem. But I feel a bit uneasy with using that approach to 'fix' this problem because I really get the impression this doctest isn't expecting to be run within an externally-set-up transaction. It seems like this test is expecting to control the transaction boundaries, and enclosing it in a transaction it doesn't know about is interfering with that. I believe there was some discussion initially about doctests somehow indicating that they wanted control over transactions, but I don't see if that discussion came to a conclusion as to how that could/should be done?
comment:24 by , 16 years ago
Hey Karen and Ramiro.
I just ran the updated test suite against Ellington, and have some good (and maybe bad) news.
The good news is that I'm seeing the talked about speedup of around 11x on Postgres (from 22k seconds to 2k seconds!).
I'm getting 11 extra failures, but most of them seem to be time related. Once you post an updated patch (with one of the suggested solutions), I'll run them against Ellington again. I can't really share the results (since the code is proprietary), but I'll gladly run the tests against your patches as they come.
by , 16 years ago
Attachment: | 8138alternate-tzcommit.diff added |
---|
by , 16 years ago
Attachment: | 8138alternate-nodoctestxaction.diff added |
---|
comment:25 by , 16 years ago
Hi Eric, and thanks for the feedback! I added two more patches. -tzcommit immediately commits the timezone setting for the two postgres backends. That should prevent the time zone setting ever getting rolled back on a connection and leading to time zone mismatch errors. -nodoctestxaction does the same plus doesn't bracket running doctests in their own transaction. I'd be interested to know how this one does. Essentially it means doctest database actions aren't cleaned up after running them, but that doesn't actually cause any failures for the Django suite -- I'd be interested in knowing if Ellington is similar here.
You say "most" of the additional failures you see are time-related, so I gather some are not? I'd be interested to know if the others are resulting from testcase assumptions about pk assignment. Django had a few tests that assumed pk's were assigned starting at 1, and that assumption is not valid when rollback is used to reset the database. So any of those I'd expect need to be fixed up to not assume the pk values will be starting at one.
follow-up: 27 comment:26 by , 16 years ago
Karen,
I ran against the nondoctestxaction diff, and it is reporting the same errors, so that's a good thing. Looks like we're not depending on things magically being reset.
I believe most of the errors are silly things on my part (we patch djangosearch which I forgot to include, and some postgres specific version logic that I didn't account for). I am trying to remedy those situations.
With those out of the way, I'm down to only 3 'real' failures. I'll look over them tonight and try to get more definitive results to you by tomorrow. However, it is looking like they are issues on our side (silly tests).
comment:27 by , 16 years ago
It appears that Ellington has a custom test runner for doc tests (basically the patch from #5624). I missed this earlier (and that explains why there was no difference between the above patches in my reply before this).
We are currently doing a flush and then a loaddata on fixtures found in doctests. We are loading fixtures in the doc tests, and not cleaning them up (until next time around when a flush occurs it seems.) I believe that a unit test is being run in between all of these instances, probably rolling back the transaction at the end (as I understand it, my database knowledge is lacking), which is allowing the rest of the tests to pass.
The good news is that I now have the entire ellington suite passing with the above behavior (with 11x speed increase). We had a couple instances of tests relying on a previous test's data sticking around, and one instance of a test depending on the pk of it's contents to pass. Fixing those up allowed me to make all of the test suite pass.
Taking out the flushing threw things through a loop. With the tzcommit patch, I was getting some collision of slug fields. This was caused by incomplete fixtures that didn't have a slug, and used to work because there wasn't another thing of that value being set in that test. These are now fixed.
I think that it will be possible to get the full suite running on the patch that does doctest transaction rollbacks. However, since we are loading fixtures outside of the doctest runner currently, I am getting some bleed over on tests. I'm assuming this is because the loaddata is happening before the call to the DoctestRunner. If we put this fixture loading logic there, inside the transaction, I think all tests would now pass. I think the fact that we're using fixtures in doctests basically makes it so that we have to run a custom test runner (unless #5624 gets accepted, where transactions would almost be necessary around doctests... This would solve the inconsistency, and give a reason to have rollbacks on doctests.)
An interesting benchmark; the test suite when I take out the flush took around 180s to run. (This is almost around a real 30x speed increase, and 3x faster than with the flush.). Unit tests seem to gain even more performance from transactions because the database was being flushed before every test (IIRC).
Sorry about the false report before, and let me know if there's anything else I can do. I'll spend some time tomorrow trying to get our suite passing without flushing of the DB, probably by moving out fixture loading inside the transaction in doctests.
by , 16 years ago
Attachment: | 8138reorder-suite.diff added |
---|
comment:28 by , 16 years ago
I've updated the patch in response to comments on the dev list: http://groups.google.com/group/django-developers/browse_thread/thread/1e4f4c840b180895/b1ed88f5ace403b4#
The test suite is now reordered so that all the tests that are going to use rollback are run first. That way they can count on a clean DB, just as they could when they did the flush/syncdb to begin.
Consensus seems to be to leave the doctest running outside of any transaction, so changes made by doctests may bleed over into other tests (as they can today).
I removed the commit of the timezone setting for postgres since it is not necessary to avoid the fixtures doctest error when doctests are not enclosed in a transaction. If it can be observed in other tests, though, I'd be interested in hearing about it. Having the timezone setting vulnerable to rollback seems odd to me.
comment:29 by , 16 years ago
One more patch -- with docs this time, that's the only difference. I have no formal knowledge of the doc markup we're using, nor do I have any idea how to build/verify that the text I've added turns into proper html for web viewing -- as far as doc goes I simply code by example and hope it turns out looking correct. So if I've messed that up let me know. Also the actual content could probably stand some improvement but it's a start.
comment:30 by , 16 years ago
The docs syntax looks ok by me. For the record we're using http://sphinx.pocoo.org/ docs, so it's just that syntax plus a few custom extensions(things like :djadmin: which should be obvious :) ).
by , 16 years ago
Attachment: | 8138withdocs.diff added |
---|
comment:31 by , 16 years ago
One last change I just noticed was necessary when running the full test suite on PostgreSQL with the new aggregation tests. TestCase
fixture teardown needs to close the database connection. If it doesn't, the connection may be re-used by a subsequent test that gets confused as a result of the time zone setting having been rolled back along with the rest of the TestCase
actions. There may be a better way to handle this but for now closing the connection after we've done a rollback on it works to avoid the pesky time zone resetting problem.
comment:33 by , 16 years ago
notice that sqlite3 + :memory: + dont flush at all (use fork() for each test, it will entirely copy in-memory sqlite3 database) -- fastest way to run tests. My personal stats (quad core) (my own app test, not django):
- postgresql+transactions: 600sec
- mysql-innodb+transactions: 450sec
- mysql-myisam: 700sec
- sqlite3-inmemory-forkhack: 180sec (!)
- sqlite3-inmemory-traditional: 400sec (a lot of time for flushing)
- postgresql+transacitons: 15 minutes and counting.. :))
comment:34 by , 16 years ago
fcuk!
notice that sqlite3 + :memory: + dont flush at all (use fork() for each test, it will entirely copy in-memory sqlite3 database) -- fastest way to run tests. [br]
My personal stats (quad core) (my own app test, not django):
- postgresql+transactions: 600sec
- mysql-innodb+transactions: 450sec
- mysql-myisam: 700sec
- sqlite3-inmemory-forkhack: 180sec (!)
- sqlite3-inmemory-traditional: 400sec (a lot of time for flushing)
- postgresql+transacitons: 15 minutes and counting.. :))
Although the naming is non-obvious (at least to me), this is a pretty incredible improvement. I'm marking this as targeted for 1.0: we should sort out the failing tests, fix the naming, and get this in. It'll really help make testing less painful, and that's fantastic.