Opened 3 years ago

Closed 3 years ago

#18687 closed Bug (fixed)

Randomly failing test: test_performance_scalability

Reported by: aaugustin Owned by: nobody
Component: Core (Other) Version: master
Severity: Release blocker Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


TestUtilsCryptoPBKDF2.test_performance_scalability fails randomly in about one build out of ten on the CI server.

As a result everyone is just learning to ignore the failures, making this test fairly useless.

Attachments (1)

18687.diff (1.2 KB) - added by aaugustin 3 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 Changed 3 years ago by ptone

  • Triage Stage changed from Unreviewed to Accepted

The single test accounts for roughly 2.5% of the time it takes to run the full test suite - and only indirectly tests for an implementation of a linearly scaling loop inside the target function. The output and logic of the function is tested elsewhere in the testcase.

Version 0, edited 3 years ago by ptone (next)

comment:2 Changed 3 years ago by akaariai

We should probably test something else than the proportion of total runtime.

I think the underlying problem is that we are testing on a machine which is doing a lot of other tests at the same time. This can result in spikes in the testing times, and these lead to false positives. The problem is not imprecise measurement, it is that the runtime of the same code is not guaranteed to stay the same between consecutive runs.

So, maybe doing 9 significantly shorter runs for iterations in alternating fashion and then comparing the median would work better? In code something like:

n1 = 1000
n2 = 10000
runs_1000 = []
runs_10000 = []
for i in range(9):
    t1 = elapsed('pbkdf2("password", "salt", iterations=%d)' % n1)
    t2 = elapsed('pbkdf2("password", "salt", iterations=%d)' % n2)
runs_1000.sort(); runs_10000.sort()
t1 = runs_1000[4]
t2 = runs_10000[4]
measured_scale_exponent = math.log(t2 / t1, n2 / n1)
self.assertLess(measured_scale_exponent, 1.1)

I haven't actually tested the above.

comment:3 Changed 3 years ago by ptone

If we had mocking in the Django test suite - it would just be a matter of testing that _fast_hmac is called <iteration> times

then a different unit test would cover the behavior of a single run of _fast_hmac

comment:4 Changed 3 years ago by Alex

Mock doesn't seem like the right answer here, it's trivially to validate the number of hmac calls, the idea of this test is to verify that if you increase the workload factor you really get more work. However I'm not sure it's adding any real value.

Changed 3 years ago by aaugustin

comment:5 Changed 3 years ago by aaugustin

In the short term, can we make the test faster by relaxing the check a bit? For instance, I'm attaching a patch that checks that 20 times more rounds are at least 10 times slower, and that is about 50 times faster than the current test.

In the long term, I'd like to hear the original author of this code before making a decision.

comment:6 Changed 3 years ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from new to closed

In [5262a288df07daa050a0e17669c3f103f47a8640]:

Fixed #18687: Removed test_performance_scalability

Even after repeated adjustment of the constants, this test still fails
randomly. It has educated us to ignore messages from Jenkins, to a
point where we missed some actual failures. In addition, it accounts
for a non-negligible percentage of the run time of the test suite
just by itself. Since no one has proposed a convincing patch in months,
I'm going to remove the patch. We can't keep a randomly failing test

Note: See TracTickets for help on using tickets.
Back to Top