osrg / namazu

:fish: 鯰: Programmable fuzzy scheduler for testing distributed systems

Home Page:http://osrg.github.io/namazu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluation of process inspector

AkihiroSuda opened this issue · comments

We need to quantitatively evaluate the process inspector as well as the Ethernet inspector
(FOSDEM presentation slide)

Tried to reproduce ZOOKEEPER-2212 with several configs.

All the experiments are done on my local lenovo pc. (Xeon E3-1220 v3 * 4, 8 GB RAM)

  • Earthquake: a7defa0
  • Kernel: 4.2.0-30-generic #36-Ubuntu
EQ Config #CPU assigned #Exp Reproducibility #Pattern@1000 exp Notes
None 4 5,000 0% 156 Data is from FOSDEM slide.
Ether 4 1,000 21.8% 573 Ditto. With latest EQ + 1 CPU, reproducibility grew to about 50%.
None 1 1,000 0% N/A
None + SCHED_BATCH 1 1,000 0% N/A
Proc(mild{UseBatch:true})
(SCHED_BATCH + random nice values)
1 5,000 0.7% 634 0.08% experiments failed due to timeout
Proc(mild{UseBatch:true}) 4 5,000 0.32% 548 No experiment failed due to timeout
Proc(mild{UseBatch:false}) 1 5,000 0.26% 914 90% experiments failed due to timeout
  • mild{UseBatch:true} provides better reproducibility than mild{UseBatch:false}, but not so good as the Ethernet inspector.
  • mild{UseBatch:false} provides better pattern growth, but not useful for ZOOKEEPER-2212 due to too many timeouts.
  • Proc(extreme) likely to cause starvation on single CPU, so I did not experimented.
  • Proc(dirichlet) hits the bug mentioned in README.

Also tested ZOOKEEPER-2137 with the latest ZooKeeper (just 50 times on 4 CPUs):

EQ Config #CPU assigned #Exp Reproducibility #Pattern@1000 exp Notes
None 4 50 2% N/A -
Proc(mild{UseBatch:true})
(SCHED_BATCH + random nice values)
4 50 16% N/A -
Proc(mild{UseBatch:true}) 1 50 2% N/A -

This reproducibility is useful enough (on 4 CPUs).
The process inspector works well with ZOOKEEPER-2137, although not with 2212.
I guess this is because ZOOKEEPER-2137 runs longer (> 1 min) than 2212,
i.e., much more chances to work are given to sched_setattr().

I keep this issue ticket open for discussion.

PTAL @mitake

Evaluated some YARN (apache/hadoop@4e4b3a8 ) tests using 13aa33b (mild{UseBatch:true), on AWS t2.large (2 CPUs assigned).

Tests are executed 100 times with/without Earthquake.

Note that this version of Earthquake does not contain an optimization (#146)

Test Reproducibility(without EQ) Reproducibility(with EQ)
YARN-4548(RM/TestCapacityScheduler) 11% 82%
YARN-4556(RM/TestFifoScheduler 2% 44%
YARN-4168(NM/TestLogAggregationService) 1% 8%
YARN-1978(NM/TestLogAggregationService 0% 4%
YARN-4543(NM/TestNodeStatusUpdater) 0% 1%

I found sometimes it is better to apply Namazu (formerly named Earthquake) for stress process rather than Hadoop mvn process.

Testcase: YARN-5043 (RM/TestAMRestart) (apache/hadoop@06413da
) using 8e4f268 (mild{UseBatch:true), on AWS t2.large (2 CPUs assigned). Done 100 times.

Stress: stress --cpu 2

Running stress? Namazu applied for Reproducibility
N None 16%
Y None 12%
N mvn 7%
Y stress 30%

TODO:

  • reevaluate other YARN tests with stress
  • scientific, and reliable analysis

I'd like to report my experiment of etcd 5022: etcd-io/etcd#5022

w/ or w/o Namazu process inspector Reproducibility
w/o 0%
w/ 2.7%

Both of a number of test running in the above experiments is 1000.

Parameters of explorer policy:

explorePolicy = "random"
[explorePolicyParam]
 procPolicy = "dirichlet"