Evaluation of process inspector

Question

Evaluation of process inspector

AkihiroSuda opened this issue 8 years ago · comments

Akihiro Suda commented 8 years ago

We need to quantitatively evaluate the process inspector as well as the Ethernet inspector
(FOSDEM presentation slide)

Akihiro Suda · Answer 1 · Wed Mar 16 2016 12:52:09 GMT+0800 (China Standard Time)

Tried to reproduce ZOOKEEPER-2212 with several configs.

All the experiments are done on my local lenovo pc. (Xeon E3-1220 v3 * 4, 8 GB RAM)

Earthquake: a7defa0
Kernel: 4.2.0-30-generic #36-Ubuntu

EQ Config	#CPU assigned	#Exp	Reproducibility	#Pattern@1000 exp	Notes
None	4	5,000	0%	156	Data is from FOSDEM slide.
Ether	4	1,000	21.8%	573	Ditto. With latest EQ + 1 CPU, reproducibility grew to about 50%.
None	1	1,000	0%	N/A
None + SCHED_BATCH	1	1,000	0%	N/A
Proc(`mild{UseBatch:true}`) (SCHED_BATCH + random nice values)	1	5,000	0.7%	634	0.08% experiments failed due to timeout
Proc(`mild{UseBatch:true}`)	4	5,000	0.32%	548	No experiment failed due to timeout
Proc(`mild{UseBatch:false}`)	1	5,000	0.26%	914	90% experiments failed due to timeout

mild{UseBatch:true} provides better reproducibility than mild{UseBatch:false}, but not so good as the Ethernet inspector.
mild{UseBatch:false} provides better pattern growth, but not useful for ZOOKEEPER-2212 due to too many timeouts.
Proc(extreme) likely to cause starvation on single CPU, so I did not experimented.
Proc(dirichlet) hits the bug mentioned in README.

Akihiro Suda · Answer 2 · Wed Mar 16 2016 16:17:31 GMT+0800 (China Standard Time)

Also tested ZOOKEEPER-2137 with the latest ZooKeeper (just 50 times on 4 CPUs):

EQ Config	#CPU assigned	#Exp	Reproducibility	#Pattern@1000 exp	Notes
None	4	50	2%	N/A	-
Proc(`mild{UseBatch:true}`) (SCHED_BATCH + random nice values)	4	50	16%	N/A	-
Proc(`mild{UseBatch:true}`)	1	50	2%	N/A	-

This reproducibility is useful enough (on 4 CPUs).
The process inspector works well with ZOOKEEPER-2137, although not with 2212.
I guess this is because ZOOKEEPER-2137 runs longer (> 1 min) than 2212,
i.e., much more chances to work are given to sched_setattr().

I keep this issue ticket open for discussion.

PTAL @mitake

Akihiro Suda · Answer 3 · Sat May 07 2016 22:25:38 GMT+0800 (China Standard Time)

Evaluated some YARN (apache/hadoop@4e4b3a8 ) tests using 13aa33b (mild{UseBatch:true), on AWS t2.large (2 CPUs assigned).

Tests are executed 100 times with/without Earthquake.

Note that this version of Earthquake does not contain an optimization (#146)

Test	Reproducibility(without EQ)	Reproducibility(with EQ)
YARN-4548(RM/TestCapacityScheduler)	11%	82%
YARN-4556(RM/TestFifoScheduler	2%	44%
YARN-4168(NM/TestLogAggregationService)	1%	8%
YARN-1978(NM/TestLogAggregationService	0%	4%
YARN-4543(NM/TestNodeStatusUpdater)	0%	1%

Akihiro Suda · Answer 4 · Sat May 07 2016 22:57:21 GMT+0800 (China Standard Time)

I found sometimes it is better to apply Namazu (formerly named Earthquake) for stress process rather than Hadoop mvn process.

Testcase: YARN-5043 (RM/TestAMRestart) (apache/hadoop@06413da
) using 8e4f268 (mild{UseBatch:true), on AWS t2.large (2 CPUs assigned). Done 100 times.

Stress: stress --cpu 2

Running stress?	Namazu applied for	Reproducibility
N	None	16%
Y	None	12%
N	mvn	7%
Y	stress	30%

TODO:

reevaluate other YARN tests with stress
scientific, and reliable analysis

Hitoshi Mitake · Answer 5 · Mon May 23 2016 10:21:39 GMT+0800 (China Standard Time)

I'd like to report my experiment of etcd 5022: etcd-io/etcd#5022

w/ or w/o Namazu process inspector	Reproducibility
w/o	0%
w/	2.7%

Both of a number of test running in the above experiments is 1000.

Parameters of explorer policy:

explorePolicy = "random"
[explorePolicyParam]
 procPolicy = "dirichlet"