Nordstrom / vmstats

Graphite reporting interface for vCenter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Doubt on how to setup freqüency and metric filters

toni-moreno opened this issue · comments

Hi @timconradinc @pdalinis .

I am very grateful for your great work.

I'm testing vmstats and it seems a promising solution to me, but there is some leak of documentation on how the tool is gathering information.

Right now we are collecting data from a TEST vCenter with 5 ESX and 5 VM's and default vmstats.properties ( but without filters to see all available metrics)

VCS_HOST=<myhost>
VCS_USER=<myuser>
VCS_PASS=<mypass>
VCS_TAG=vcs_tag
ESX_STATS=true
GRAPHITE_HOST=localhost
GRAPHITE_PORT=2003
GRAPHITE_TAG=vmstats
USE_FQDN=false
SLEEP_TIME=300
CACHED_LOOP_CYCLES=3600
MAX_VMSTAT_THREADS=8
MAX_ESXSTAT_THREADS=4
MAX_GRAPHITE_THREADS=7
SEND_ALL_PERIODS=true
SEND_ALL_ABSOLUTE=true
SEND_ALL_DELTA=true
STAT_EXCLUDES=
DISCONNECT_GRAPHITE_AFTER=0

In this config I can no see which will be the gathering frequency ( I would like 1 metric / minute or 1 metric / 5 minutes) . How can I configure to get metrics only 1 data/ minute for each metric?

Another Question is about metric filtering. I'm my test I'm getting over 15000 metrics for only 5 ESX and 5 VM's.

There is any way to limit to get the most basic metrics instead of getting everything?

I'm executing with -P flag (only once ) and It is getting metrics from 1 hour ago . Is It the expected behavior ?

SLEEP_TIME is how often vmstats will connect to vCenter to gather metrics. SEND_ALL_PERIODS=true means that when vmstats connects, it will gather the highest resolution data that is available in vCenter and send all the data points since the last run. For us, that is 20s resolution, so every five minutes we gather and send 15 data points for each metric.

Filtering is done with STAT_EXCLUDES, i.e.:
STAT_EXCLUDES=^datastore.$, ^hbr.$

Hi @tmonk42 , thank you for your fast response.

If I understood correctly to get only 1 metric / minute I should configure.

SLEEP_TIME=60
SEND_ALL_PERIODS=false

And excludes has support support regular expressions, but , for complete metric name?

Lots of thanks, I will test right now.

After reconfiguration done

SLEEP_TIME=60
SEND_ALL_PERIODS=false

And executed with -D output the output , the debug-gwriter-pool-5-thread-1.log file is still showing metrics each 20 seconds...

What I'm doing wrong?

vmstats.vcs_tag.cluster01.vm.hostname01.datastore.totalWriteLatency.53f727d1-6a1b3ef7-0fb1-8c89a588cb62.average 0 1425908020
vmstats.vcs_tag.cluster01.vm.hostname01.datastore.totalWriteLatency.53f727d1-6a1b3ef7-0fb1-8c89a588cb62.average 0 1425908040
vmstats.vcs_tag.cluster01.vm.hostname01.datastore.totalWriteLatency.53f727d1-6a1b3ef7-0fb1-8c89a588cb62.average 0 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.virtualDisk.write.scsi0:0.average 0 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.mem.zipSaved.latest 0 1425908020
vmstats.vcs_tag.cluster01.vm.hostname01.mem.zipSaved.latest 0 1425908040
vmstats.vcs_tag.cluster01.vm.hostname01.mem.zipSaved.latest 0 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.mem.decompressionRate.average 0 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.mem.swapped.average 0 1425908020
vmstats.vcs_tag.cluster01.vm.hostname01.mem.swapped.average 0 1425908040
vmstats.vcs_tag.cluster01.vm.hostname01.mem.swapped.average 0 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.usage.average 12 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.net.bytesRx.4000.average 0 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.ready.0.summation 2 1425908020
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.ready.0.summation 3 1425908040
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.ready.0.summation 2 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.disk.numberReadAveraged.naa_60a980003246686b5624437237486a61.average 0 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.system.summation 1 1425908020
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.system.summation 0 1425908040
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.system.summation 1 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.costop.summation 0 1425908020
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.costop.summation 0 1425908040
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.costop.summation 0 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.ready.summation 2 1425908020
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.ready.summation 3 1425908040
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.ready.summation 2 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.virtualDisk.smallSeeks.scsi0:0.latest 0 1425908020
vmstats.vcs_tag.cluster01.vm.hostname01.virtualDisk.smallSeeks.scsi0:0.latest 0 1425908040
vmstats.vcs_tag.cluster01.vm.hostname01.virtualDisk.smallSeeks.scsi0:0.latest 0 1425908060
vmstats.vcs_tag.cluster01.vm.hostname01.cpu.maxlimited.summation 0 1425908020
``

Hi @tmonk42 after review the code:

https://github.com/Nordstrom/vmstats/blob/master/src/main/java/org/timconrad/vmstats/statsGrabber.java#L179-L202

I've seen that the correct way to setup 1 metric / minute is
with

SLEEP_TIME=60
SEND_ALL_PERIODS=false
SEND_ALL_ABSOLUTE=false
SEND_ALL_DELTA=false

hi @tmonk42 , can you help me to understand the meaning of the CACHED_LOOP_CYCLES , parameters?
Why is exactly vmstats caching metrics ?
to prevent metric lost when remote graphite is down ?
to prevent metric lost when slow connections ?

Hi @tmonk42 , after review the code I can see that.

STAT_EXCLUDES exclude all metrics from a group, and should match complete metric group.

https://github.com/Nordstrom/vmstats/blob/master/src/main/java/org/timconrad/vmstats/Main.java#L240-L242

This is , STAT_EXCLUDES can not filter metric names.

Hello Toni,
Please take a look at StatsFeeder with GraphiteReceiver plug-in.

Thanks

Lava

hi @lbasavap , I've already done the metric filtering in the following branch (https://github.com/toni-moreno/vmstats/tree/improved_metric_selection) , but I'm still waiting to merge this PR first (#19).

Why is better gathering data with https://labs.vmware.com/flings/statsfeeder than with vijava sdk ?

How many VM's ESX are you getting data with your https://github.com/lbasavap/GraphiteReceiver ?