cdsap / Talaiot

Simple and extensible plugin to track task times in your Gradle Project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Prometheus PushGateway publisher logic

jdomag opened this issue · comments

I found it great idea to push task metrics to Prometheus and visualize it in Grafana. However current approach is that pushgateway publisher publish each metric with separate name e.g. the name of the metric is set to :project:task. I have 100 projects that execute the same task so metrics in prometheus looks like this:
image

How am I suppose to create a grafana dashboard showing how much time the task for each project took? I think the metric name should be set on the same value e.g. set by parameter taskJobName in pushgateway publisher. Inside the metric we should see the field "task" or "project"
Then I grab that metric and do something like
gradleTaskDuration{project="xyz", task"abc"}.

Unless there are some other way to use those metrics which I'm not aware of.

Hi @jdomagm thanks for the report.
You're completely right, recently I received one proposal addressing this issue. Going to ping him to request permission to share the proposal

@cdsap
I would really appreciate it, thanks! 👍

@cdsap any update on this?

A while ago I had also plans to use Talaiot with Prometheus as well, and noticed that the publisher could be improved. Below is a draft of how I would try to improve the situation:

Change description

Modify what data is exported from Talaiot by default when used with Prometheus Push Gateway publisher.

Reasons for change:

  • High cardinality in label values
  • Redundant data
  • Attempt to align with Prometheus metrics guidelines and best practices

Assuming common use case is for collecting metrics for a single project (application), and build executed on developer machines or build server.

Build Metrics

Common labels

Labels added to all gradle_build_* metrics:

  • branch - git branch if present
  • hostname
  • requestedtask - i.e.: :app:assembleDevDebug

Metrics

Build metrics and extra labels if different than common

  • gradle_build_total_time_seconds (gauge)
  • gradle_build_cache_ratio (gauge)
  • gradle_build_configuration_time_seconds (gauge)
  • gradle_build_local_cache_hit_counter (gauge)
  • gradle_build_local_cache_miss_counter (gauge)
  • gradle_build_remote_cache_hit_counter (gauge)
  • gradle_build_remote_cache_miss_counter (gauge)
  • gradle_build_info (untyped)
    • version - gradle version
    • jvm - Java version name i.e.: "11.0.11+9"
    • jvmargs - extra JVM parameters i.e.: "-Xmx4g -XX:+UseParallelGC -XX:MaxPermSize=512m", parameters are always sorted to avoid having multiple values for same effective settings
    • flags - i.e.: cache=false,configurationOnDemand=false,daemon=false,dryRun=false,parallel=true,refreshDependencies=false,rerunTasks=false,scan=false, flags always sorted, similar like for jvmargs
    • os - i.e.: Apple MacOSX-10.16 (manufacturer + os version)
    • cpucores - how many CPU cores are available
    • maxworkers - how many Gradle workers to use

New Task Metrics

Metrics

  • gradle_task_$TASK_NAME_execution_time_seconds (gauge, miliseconds after decimal point)
    • project - Gradle project that this task was executed on, i.e. :featureflags:core
    • critical - true | false
    • localcache - HIT | MISS (no label if local caching disabled)
    • remotecache - HIT | MISS (no label if remote caching disabled)
    • state - EXECUTED | UP_TO_DATE | NO_SOURCE | FROM_CACHE

Reference

@jdomag With above changes, I think your metric would be possible to extract with gradle_task_abc_execution_time_seconds{project=":xyz"}

@apolatynski Thank you very much.
I created one draft with the rework of the publisher: #335
I've included the custom metrics(task/build) as labels, in influxDb we use them as tags to index better the data.
The only missing part is the untyped metric.

I would appreciate it if you can take a look.

@apolatynski @cdsap
First of all thanks for taking care of this, much appreciated.

One question - wouldn't be better to exclude a task name from the metric name as following:

gradle_task_execution_time_seconds (gauge, miliseconds after decimal point)

- project - Gradle project that this task was executed on, i.e. :featureflags:core
- taskname - gradle task that was executed 
- critical - true | false
- localcache - HIT | MISS (no label if local caching disabled)
- remotecache - HIT | MISS (no label if remote caching disabled)
- state - EXECUTED | UP_TO_DATE | NO_SOURCE | FROM_CACHE

That would allow people to gather metrics per project e.g. duration of all the tasks within a project sum(gradle_task_execution_time_seconds{project="xyz"})

@jdomag
Why I didn't include taskname as a label, is because of number of distinct values. In project that I'm working on, I have around 2000 unique gradle tasks. and 77 subprojects. Considering all other labels, cardinality would be 2000 * 77 * 2 * 2 * 2 * 4 = 4928000. In real life scenario, you probably won't see all those tasks used, but it's still way more than the recommended maximum of 100 😄

With the initial approach, I believe you can get a metric that you want with a bit ugly query

sum({__name__=~"gradle_task_.*_execution_time_seconds", project=":xyz"})

Mentioned scenario probably would still be handled by Prometheus, but I'm worried that larger projects might already reach 10s of millions of timeseries generated. What do you think @cdsap ?

very interesting, after reading again the recommendation from Prometheus is clear that using tasks as labels reach the recommended cardinality value(100). However, analyzing our project, we have 600 modules and 13k unique tasks for the main development tasks. The number of modules/projects is a variable that tends to grow and will increase the cardinality of the metric.
I'm ok with both approaches, I think the implementation depends on the specific requirements, the user may have different use cases:

  • Group and aggregate duration by task: Include task as label
  • Monitor duration of a specific task(s): Include task as metric name

I could add one additional property in the PushgatewayExtension allowing to set this behavior like:

   publishers {
     pushgateway {
        ....
        taskNameAsLabel = true| false
     }

Let me know what you think. Btw I merged the first approach and is available under snapshot in case you want to test it:

maven ( url = uri("https://s01.oss.sonatype.org/content/repositories/snapshots/") )
(https://github.com/cdsap/Talaiot#individual-plugin)classpath("io.github.cdsap:talaiot:1.5.2-SNAPSHOT")

@apolatynski @jdomag , I finished the update of the Publisher configuration: https://github.com/cdsap/Talaiot/pull/337/files
However, I found an issue on the E2E with a PushGateway instance.
My approach was to include the task name as label:

gradle_task { task=clean } 

But when I'm collecting more than one task I get:

gradle_task is already in use by another Collector of type Gauge

Wrongly I assumed I could request unique metrics with different task names as labels. And now I get @apolatynski concern because you must to inform all the tasks on the same metric. I'm not sure if Prometheus offers one type for our requirement:

gradle_task { task=clean } 1
gradle_task { task=assemble } 3
gradle_task { task= test } 2

Hi, version 1.5.2 has been published. Please, can you take a look at the pushgateway plugin before we make the announcement?

Thanks

@cdsap
How to use it? Was that merged to 1.5.2?
I've tried to do

plugins {
        id "io.github.cdsap.talaiot.plugin.pushgateway" version "1.5.3"
}

and then configuration looks like this

talaiot {
    logger = io.github.cdsap.talaiot.logger.LogTracker.Mode.INFO
    metrics {
        customBuildMetrics(
        ["build_tag": "${System.env.BUILD_TAG}".toString(),
        "build_id": "${System.env.BUILD_ID}".toString(),
        "job_name": "${System.env.JOB_NAME}".toString()
        ]
        )
    }
    publishers {
        pushGatewayPublisher {
            url = "${System.env.PUSHGATEWAY_URL}"
            taskJobName = "gCICD_Pipeline"
            buildJobName = "gCICD_Pipeline"
        }
    }
}

But I can't see any relevant metrics in push gateway