testing: add ratcheting variants #7465

josharian · 2014-03-04T17:45:10Z

For some testing and benchmark purposes, a ratchet is better suited than an average.

https://golang.org/cl/67870053/ bumps up the number of AllocsPerRun runs of an
http test to avoid flakiness. This test would be more reliable using a lower number of
runs if it could measure the best run rather than the average. In addition, it could set
an explicit (rather than comparative) goal for the number of allocs, which would allow
it to catch other regressions. With care, MinAllocsPerRun could even use heuristics to
avoid requiring the user to pass an explicit number of runs.

For benchmarking tightly CPU-bound code with minimal scheduler/OS interactions, a
ratcheting benchmark will often yield more stable, useful results than an averaging
benchmark.

ianlancetaylor · 2014-05-09T22:08:01Z

Comment 1:

Labels changed: added repo-main, release-none.

minux · 2014-05-09T22:41:12Z

Comment 2:

i'd expect that using the best result of abfew runs might introduce yet another kind of
flaky, i.e. false positive one. comparing to false negative flaky results we are
getting,  i'd rather get the later.

rsc · 2017-03-07T03:46:49Z

For allocs, I agree that it would be nice to fix AllocsPerRun in some ideal world, although we're a bit stuck with it now. I'm also not sure we can build an API with no runs parameter: it seems like at the least you need a max count. If f is expensive then you might not want to run it very many times, and if f is unstable then you need to cut it off at some point. It might be nice to sketch out a func CountAllocs(f func()) int, but I'd be worried about these kinds of complications. In contrast, AllocsPerRun is very easy to specify and understand. There's no magic that can break.

For CPU, I think the number of times when you actually want just a ratchet is pretty low. Modern systems are weird enough that even the lowest possible observed time can be misleading. Maybe 99% of the time the top takes 5ns but occasionally the stars align just right and it takes 3ns. I've seen craziness like this. Then the min of all the runs is noisier than the average. I do think we should expose the underlying distribution, as in #19128, which is much better than any one number.

Given #19128, can we trim this issue down to being just about allocation counting?

josharian · 2017-03-07T14:31:50Z

For CPU, I think the number of times when you actually want just a ratchet is pretty low.

Fair enough. And my benchmarking interests are probably atypical.

Given #19128, can we trim this issue down to being just about allocation counting?

Yes.

josharian added new labels May 9, 2014

bradfitz removed the new label Dec 18, 2014

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed release-none labels Apr 10, 2015

josharian mentioned this issue Mar 6, 2017

testing: add -benchsplit to get more data points #19128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: add ratcheting variants #7465

testing: add ratcheting variants #7465

josharian commented Mar 4, 2014

ianlancetaylor commented May 9, 2014

minux commented May 9, 2014

rsc commented Mar 7, 2017

josharian commented Mar 7, 2017

testing: add ratcheting variants #7465

testing: add ratcheting variants #7465

Comments

josharian commented Mar 4, 2014

ianlancetaylor commented May 9, 2014

minux commented May 9, 2014

rsc commented Mar 7, 2017

josharian commented Mar 7, 2017