Slow stats collection outside of AWS

Question

Slow stats collection outside of AWS

BrianGallew opened this issue 5 years ago · comments

All my Kafka brokers in AWS have no problems meeting the 30 second polling interval for kafkastats. However, all of the brokers on physical hardware show crazy intervals between metric publishing events.

2019-01-16 18:22:20.856 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547662170130, "id": 1441
2019-01-16 18:43:50.866 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547663459364, "id": 1441
2019-01-16 18:56:41.630 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547664230872, "id": 1441
2019-01-16 19:09:32.229 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547665001633, "id": 1441
2019-01-16 19:22:22.797 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547665772231, "id": 1441
2019-01-16 19:44:07.842 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547667076506, "id": 1441
2019-01-16 19:56:58.609 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547667847848, "id": 1441
2019-01-16 20:09:49.174 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547668618610, "id": 1441
2019-01-16 20:22:39.883 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547669389176, "id": 1441
2019-01-16 20:44:25.059 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547670693733, "id": 1441
2019-01-16 20:57:15.843 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547671465064, "id": 1441
2019-01-16 21:10:06.384 [StatsReporter] INFO  com.pinterest.doctorkafka.stats.BrokerStatsReporter - published to kafka : {"timestamp": 1547672235845, "id": 1441

The brokers are averaging 90% idle, with reasonable amounts of free memory. Any ideas where I should be looking to see what it's trying to do?

Brian Gallew · Answer 1 · Thu Jan 17 2019 05:41:47 GMT+0800 (China Standard Time)

I've tried a dozen times to attach a screenshot from glances but ... it's not working. So I'll put in this:


CPU       7.0%  nice:     0.0%                                     LOAD    32-core                                     MEM     24.3%  active:    73.2G                                     SWAP      0.0%
user:     4.6%  irq:      0.0%                                     1 min:    1.69                                      total:   126G  inactive:  47.2G                                     total:       0
system:   1.7%  iowait:   0.0%                                     5 min:    2.24                                      used:   30.6G  buffers:   2.12M                                     used:        0
idle:    93.0%  steal:    0.0%                                     15 min:   2.38                                      free:   95.3G  cached:    94.8G                                     free:        0

NETWORK     Rx/s   Tx/s   Processes filter: .*kafka.* (press ENTER to edit)
bond0      135Mb 58.7Mb   TASKS 4 (324 thr), 0 run, 4 slp, 0 oth sorted automatically by cpu_percent, flat view
eno1          0b     0b
eno2          0b     0b     CPU%  MEM%  VIRT   RES   PID USER        NI S    TIME+ IOR/s IOW/s Command
ens1f4      82Kb     0b    227.2  20.0 64.5G 25.2G  2072 kafka        0 S 59:18.40     0     0 /usr/lib/jvm/java-8-oracle/bin/java -Xmx24G -Xms24G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:Initi
ens1f4d1   135Mb 58.7Mb      0.3   0.2 5.22G  243M 15576 nobody       0 S  0:08.24     0     0 /usr/bin/java -server -Xmx800M -Xms800M -verbosegc -Xloggc:/var/log/doctorkafka/gc.log -XX:+UseGCLogFileRo
lo         102Kb  102Kb      0.0   0.2 4.82G  293M 18656 dd-agent     0 S 29:29.53     0     0 java -Xms50m -Xmx200m -classpath /opt/datadog-agent/agent/checks/libs/jmxfetch-0.20.1-jar-with-dependencie
                             0.0   0.0 5.90M  688K  9160 bgallew      0 S  0:00.00     0     0 tail -F /var/log/doctorkafka/kafkastats.log
DISK I/O     R/s    W/s
md0         382K  14.9M
sda1           0    55K

Yu Yang · Answer 2 · Thu Jan 17 2019 06:00:08 GMT+0800 (China Standard Time)

I am not sure about root cause of the slowness in polling for stats. Can you relax kafkastats polling interval to 60 seconds -pollingintervalinseconds 60 will that solve the problem?

Brian Gallew · Answer 3 · Sat Jan 19 2019 01:55:50 GMT+0800 (China Standard Time)

I've tried that (and longer times!). No bueno.

Brian Gallew · Answer 4 · Fri Feb 01 2019 00:16:04 GMT+0800 (China Standard Time)

It looks like the actual issue is that /usr/bin/ec2metadata keeps getting run over and over even though the provided metadata can't really change after the system starts up. I'm going to look to see if I can figure out why it's getting re-run and stop that.