prometheus-community / windows_exporter

Prometheus exporter for Windows machines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Windows service collector takes over 500s to scrape data

paologallinaharbur opened this issue · comments

We have been noticing the exporter taking more time to scrape data than expected both if executed with use-api or with the old WMI while filtering services

$ windows_exporter.exe --collectors.enabled service --log.level debug --collector.service.services-where "Name='***'" 
or
$ windows_exporter.exe --collectors.enabled service --log.level debug --collector.service.use-api

We have a "normal" number of services that are scraped, but the exporter takes a lot to answer, in particular I find weird the following metrics

windows_exporter_collector_duration_seconds{collector="service"} 0.434567
windows_exporter_perflib_snapshot_duration_seconds 528.34678

I have the feeling that scraping the services is "quick" but crating a perflib snapshot wastes a lot of time, even thought the service has an empty list of perfCounters

  • Do you have any idea why this could be happening?
  • Is the perflib snapshot creation something that be avoided somehow? As far as I know it reports the time taken by this function to be executed, but I was unable to understand more 😕

That feels strange, because service collector doesn't interact with perflib

@paologallinaharbur by any chance, I would like to know, if #1497 help you.

Do you have any idea why this could be happening?

WMI is slow, and the API based approach is asking each single service about status and query.

#1497 is using a different approach by asking the Windows API once. On my local system, it's quite fast but It wont provide all information, like run_as, pid or start mode

WMI is slow, and the API based approach is asking each single service about status and query.

Still I see the exporter wasting more time on perflib rather then collecting metrics and in VMs having similar number of services the perflib_snapshot si close to zero 🤔

windows_exporter_collector_duration_seconds{collector="service"} 0.543678
windows_exporter_perflib_snapshot_duration_seconds 286.9878764

I'll test out the solution proposed, but I wonder if the perflib "time" would be reduced as well

It could be possible that query nothing takes up to 300 seconds on your system.

I am not following.. is that an actual possibility? 😅 If so:

  • how is actually doing during those 300s?
  • is there a way to run the exporter with "no collectors" and test that? I've tried with --collectors.enabled, but I think at least one should be enabled 😕

Maybe you are hitting a bug in windows_exporter.

With

windows_exporter.exe --collectors.enabled service

you only have enabled a collectors which does not register and request any perflib counters.

This leads to an situation where is where clause gets empty. (This is what I mean with query nothing). I guess thats unknown behavior here. An empty where clause cloud also leads to return all data. But I'm unable to verify that.

Background: In windows_exporter, perfdata is always collected before collectors are called. On startup, collectors can register perfdata counters. Then if Prometheus is calling /metrics, windows_exporter scrape all registered perfdata once and will provide the data to each collector.

if you configure

windows_exporter.exe --collectors.enabled service,os

then, there is a where clause. It would be great to know if this helps and validate the assumption.

From perfdata point of view, there is no difference between zero collector or service collector enabled.


You can also try to use

windows_exporter.exe --collectors.enabled textfile --collector.textfile.directories=path_to_dir_with_empty_file

which should do nothing. But with zero collectors, windows_exporter would still ask for perfdata.