[Feature] monitor
eecsmap opened this issue · comments
Do we have a plan to provide a monitoring / dashboard feature, to provide cluster status such as:
- vm counts
- vm workload vs capacity
- total CPU, memory, etc
We could start with some basic information first. :)
Statistics in a weekly/monthly base like:
- total vm runs
- average pending time trend (help us to decide whether we need to deploy more workers).
- etc.
will be very useful too.
We've had very good experience lately with OpenTelemetry for Cirrus Runners. Workers can also emit stats like disk space, CPU utilization, VM startup time, pull times, etc.
Good to know we have source of data. What about the dashboard/monitoring, do we have some ideas about it? I was thinking about something like the Anka's, starting with some basic feature first.
You'll be able to export OpenTelementy in your dashboard of choice.