Opened 12 years ago

Closed 12 years ago

#18687 closed Bug (fixed)

Randomly failing test: test_performance_scalability

Reported by: Aymeric Augustin Owned by: nobody
Component: Core (Other) Version: dev
Severity: Release blocker Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

TestUtilsCryptoPBKDF2.test_performance_scalability fails randomly in about one build out of ten on the CI server.

As a result everyone is just learning to ignore the failures, making this test fairly useless.

Attachments (1)

18687.diff (1.2 KB ) - added by Aymeric Augustin 12 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 by Preston Holmes, 12 years ago

Triage Stage: UnreviewedAccepted

The single test accounts for roughly 2.5% of the time it takes to run the full test suite - and only indirectly tests for an implementation of a linearly scaling loop inside the target function. The output and logic of the function is tested elsewhere in the testcase.

Version 0, edited 12 years ago by Preston Holmes (next)

comment:2 by Anssi Kääriäinen, 12 years ago

We should probably test something else than the proportion of total runtime.

I think the underlying problem is that we are testing on a machine which is doing a lot of other tests at the same time. This can result in spikes in the testing times, and these lead to false positives. The problem is not imprecise measurement, it is that the runtime of the same code is not guaranteed to stay the same between consecutive runs.

So, maybe doing 9 significantly shorter runs for iterations in alternating fashion and then comparing the median would work better? In code something like:

n1 = 1000
n2 = 10000
runs_1000 = []
runs_10000 = []
for i in range(9):
    t1 = elapsed('pbkdf2("password", "salt", iterations=%d)' % n1)
    t2 = elapsed('pbkdf2("password", "salt", iterations=%d)' % n2)
    runs_1000.append(t1)
    runs_10000.append(t2)
runs_1000.sort(); runs_10000.sort()
t1 = runs_1000[4]
t2 = runs_10000[4]
measured_scale_exponent = math.log(t2 / t1, n2 / n1)
self.assertLess(measured_scale_exponent, 1.1)

I haven't actually tested the above.

comment:3 by Preston Holmes, 12 years ago

If we had mocking in the Django test suite - it would just be a matter of testing that _fast_hmac is called <iteration> times

then a different unit test would cover the behavior of a single run of _fast_hmac

comment:4 by Alex Gaynor, 12 years ago

Mock doesn't seem like the right answer here, it's trivially to validate the number of hmac calls, the idea of this test is to verify that if you increase the workload factor you really get more work. However I'm not sure it's adding any real value.

by Aymeric Augustin, 12 years ago

Attachment: 18687.diff added

comment:5 by Aymeric Augustin, 12 years ago

In the short term, can we make the test faster by relaxing the check a bit? For instance, I'm attaching a patch that checks that 20 times more rounds are at least 10 times slower, and that is about 50 times faster than the current test.

In the long term, I'd like to hear the original author of this code before making a decision.

comment:6 by Aymeric Augustin <aymeric.augustin@…>, 12 years ago

Resolution: fixed
Status: newclosed

In [5262a288df07daa050a0e17669c3f103f47a8640]:

Fixed #18687: Removed test_performance_scalability

Even after repeated adjustment of the constants, this test still fails
randomly. It has educated us to ignore messages from Jenkins, to a
point where we missed some actual failures. In addition, it accounts
for a non-negligible percentage of the run time of the test suite
just by itself. Since no one has proposed a convincing patch in months,
I'm going to remove the patch. We can't keep a randomly failing test
forever.

Note: See TracTickets for help on using tickets.
Back to Top