Evaluation of process inspector
AkihiroSuda opened this issue · comments
We need to quantitatively evaluate the process inspector as well as the Ethernet inspector
(FOSDEM presentation slide)
Tried to reproduce ZOOKEEPER-2212 with several configs.
All the experiments are done on my local lenovo pc. (Xeon E3-1220 v3 * 4, 8 GB RAM)
EQ Config | #CPU assigned | #Exp | Reproducibility | #Pattern@1000 exp | Notes |
---|---|---|---|---|---|
None | 4 | 5,000 | 0% | 156 | Data is from FOSDEM slide. |
Ether | 4 | 1,000 | 21.8% | 573 | Ditto. With latest EQ + 1 CPU, reproducibility grew to about 50%. |
None | 1 | 1,000 | 0% | N/A | |
None + SCHED_BATCH | 1 | 1,000 | 0% | N/A | |
Proc(mild{UseBatch:true} )(SCHED_BATCH + random nice values) |
1 | 5,000 | 0.7% | 634 | 0.08% experiments failed due to timeout |
Proc(mild{UseBatch:true} ) |
4 | 5,000 | 0.32% | 548 | No experiment failed due to timeout |
Proc(mild{UseBatch:false} ) |
1 | 5,000 | 0.26% | 914 | 90% experiments failed due to timeout |
mild{UseBatch:true}
provides better reproducibility thanmild{UseBatch:false}
, but not so good as the Ethernet inspector.mild{UseBatch:false}
provides better pattern growth, but not useful for ZOOKEEPER-2212 due to too many timeouts.- Proc(
extreme
) likely to cause starvation on single CPU, so I did not experimented. - Proc(
dirichlet
) hits the bug mentioned in README.
Also tested ZOOKEEPER-2137 with the latest ZooKeeper (just 50 times on 4 CPUs):
EQ Config | #CPU assigned | #Exp | Reproducibility | #Pattern@1000 exp | Notes |
---|---|---|---|---|---|
None | 4 | 50 | 2% | N/A | - |
Proc(mild{UseBatch:true} )(SCHED_BATCH + random nice values) |
4 | 50 | 16% | N/A | - |
Proc(mild{UseBatch:true} ) |
1 | 50 | 2% | N/A | - |
This reproducibility is useful enough (on 4 CPUs).
The process inspector works well with ZOOKEEPER-2137, although not with 2212.
I guess this is because ZOOKEEPER-2137 runs longer (> 1 min) than 2212,
i.e., much more chances to work are given to sched_setattr()
.
I keep this issue ticket open for discussion.
PTAL @mitake
Evaluated some YARN (apache/hadoop@4e4b3a8 ) tests using 13aa33b (mild{UseBatch:true
), on AWS t2.large (2 CPUs assigned).
Tests are executed 100 times with/without Earthquake.
Note that this version of Earthquake does not contain an optimization (#146)
Test | Reproducibility(without EQ) | Reproducibility(with EQ) |
---|---|---|
YARN-4548(RM/TestCapacityScheduler) | 11% | 82% |
YARN-4556(RM/TestFifoScheduler | 2% | 44% |
YARN-4168(NM/TestLogAggregationService) | 1% | 8% |
YARN-1978(NM/TestLogAggregationService | 0% | 4% |
YARN-4543(NM/TestNodeStatusUpdater) | 0% | 1% |
I found sometimes it is better to apply Namazu (formerly named Earthquake) for stress
process rather than Hadoop mvn
process.
Testcase: YARN-5043 (RM/TestAMRestart) (apache/hadoop@06413da
) using 8e4f268 (mild{UseBatch:true)
, on AWS t2.large (2 CPUs assigned). Done 100 times.
Stress: stress --cpu 2
Running stress? | Namazu applied for | Reproducibility |
---|---|---|
N | None | 16% |
Y | None | 12% |
N | mvn | 7% |
Y | stress | 30% |
TODO:
- reevaluate other YARN tests with
stress
- scientific, and reliable analysis
I'd like to report my experiment of etcd 5022: etcd-io/etcd#5022
w/ or w/o Namazu process inspector | Reproducibility |
---|---|
w/o | 0% |
w/ | 2.7% |
Both of a number of test running in the above experiments is 1000.
Parameters of explorer policy:
explorePolicy = "random"
[explorePolicyParam]
procPolicy = "dirichlet"