Performance Bottleneck

Question

Performance Bottleneck

zywh opened this issue 3 years ago · comments

I started using fping to ping/measure ~400K targets(IPs) but I can't push more than 8Mbps (transit) per VM (ESXi)

Environment

ESXi with different "Linux" guest OS . The server has 40 sockets and 80 logic CPUs and ~100G mem and 10G NIC connected to ISP CORE router directly. It's pretty decent machine and setup. I wrote a python wrapper to muti- thread "fping" and collect result. Here is initial result

Using 1 VM , I'm able to ping ~200K targets using "-c5 -i4 -b12 -t500 -r1" around 60 second

Threads (python) = 72
ping bucket (per fping) = 500

CPU ( 8x vCPU )is kind of busy but OK. I tried more vCPU up to 32 and it doesn't make difference

Same setup if I double threads or increase "number of target per FPING". it will run "faster" however I notice rtt increase and packetloss become unreliable
Ubuntu 21 and Centos 8 . They have similar performance
Linux "Alpine". It's much slower with same setup.
If fping interval (-i) is changed to 2ms, it's faster but result (rtt/packetloss) become unreliable
if fping interval (-i) is changed to 10ms, it will slow down each round (200K) to ~90 seconds
I looked around Linux kernel and adjusted few ICMP/socket parameters . No luck
tried multiple process as well. It's similar result as multi-thread
I tried few python native ICMP packages and none of them is reliable/faster comparing to "fping"

Basically I can do 200K targets in 60s per VM.. PPS = 400,000 * 5 / 60 ~ 16K PPS . The network traffic is max at ~8Mbps with stable result. With 3 x VM , I can do ~600K targets. It's quite amazing but like to understand if I can push more per VM

Does someone have clue where the bottom neck is? Linux ICMP socket kernel?

David Ying Zheng · Answer 1 · Thu Jul 08 2021 23:26:10 GMT+0800 (China Standard Time)

Python wrapper is here as reference

https://github.com/zywh/pyfping