tlsfuzzer / tlsfuzzer

SSL and TLS protocol test suite and fuzzer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reduce false positive rate of timing tests and add tools for handling them

tomato42 opened this issue · comments

While we have tests to verify Lucky13 and Bleichenbacher now:

they have quite significant false positive rate (>20%). We should improve the used statistical classifiers, handling of outliers, way the data is collected, etc., so that the false positive rate is more manageable (<5%)

see also #106

Actually, we should be careful with sample sizes, as too small sample sizes will not show effect sizes that are measurable in practice. See https://stats.stackexchange.com/a/2522/289885 :

In a situation where a "simple" null is tested against a "compound" alternative, as in classic t-tests or z-tests, it typically takes a sample size proportional to 1/ϵ² to detect an effect size of ϵ. There's a practical upper bound to this in any study, implying there's a practical lower bound on a detectable effect size. So, as a theoretical matter der Laan and Rose are correct, but we should take care in applying their conclusion.

i.e. to detect a 1% effect size we need a sample size of 10k, and 1M sample size to detect an effect size of 0.1%

and we need to remember that p-value is independent of sample size: the 5% false positive rate for alpha of 0.05 is a constant

for very large sample sizes and quick response times we may need to look into checking the statistical importance not statistical significance of the result (as a result that tells us that one class is different than another by less that one CPU cycle, then it's not a meaningful result), see https://stats.stackexchange.com/a/7849/289885