aws-observability / aws-otel-collector

AWS Distro for OpenTelemetry Collector (see ADOT Roadmap at https://github.com/orgs/aws-observability/projects/4)

Home Page:https://aws-otel.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Looking for service level metrics (RunningTaskCount, PendingTaskCount, DesiredTaskCount)

silk-bahamut opened this issue · comments

Question

I'm looking to replace the Container Insight standard configuration in an ECS cluster with a custom one with aws-otel-collector.

I've found many explanation on how to configure all the metrics but my problem is that I wish to find Service level metrics.
Here are all the metrics that I found and all seems to be at tasks level.

In this list, my main interests are:

  • RunningTaskCount
  • PendingTaskCount
  • DesiredTaskCount

is there any way to access these metrics with the aws-otel-collector?

Tests

I tried starting my cluster with the side car defined as below:

{
            "name": "aws-otel-collector",
            "image": "amazon/aws-otel-collector",
            "cpu": 0,
            "portMappings": [],
            "essential": true,
            "command": [
                "{{command}}"
            ],
            "environment": [],
            "mountPoints": [],
            "volumesFrom": [],
            "secrets": [
                {
                    "name": "AOT_CONFIG_CONTENT",
                    "valueFrom": "arn:aws:ssm:eu-central-1:123456789:parameter/bru/adot/config"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "loggroup-adot-test",
                    "awslogs-region": "eu-central-1",
                    "awslogs-stream-prefix": "metrics"
                }
            },
            "healthCheck": {
                "command": [
                    "/healthcheck"
                ],
                "interval": 5,
                "timeout": 6,
                "retries": 5,
                "startPeriod": 1
            },
            "systemControls": []
        },

And I saw many new metrics in cloudwatch:

{
    "view": "timeSeries",
    "stacked": false,
    "metrics": [
        [ "ECS/ContainerInsights", "container.cpu.cores" ],
        [ ".", "container.cpu.onlines" ],
        [ ".", "container.cpu.reserved" ],
        [ ".", "container.cpu.usage.vcpu" ],
        [ ".", "container.cpu.utilized" ],
        [ ".", "container.memory.reserved" ],
        [ ".", "container.memory.usage" ],
        [ ".", "container.memory.usage.limit" ],
        [ ".", "container.memory.usage.max" ],
        [ ".", "container.memory.utilized" ],
        [ ".", "container.network.rate.rx" ],
        [ ".", "container.network.rate.tx" ],
        [ ".", "ecs.task.cpu.cores" ],
        [ ".", "ecs.task.cpu.onlines" ],
        [ ".", "ecs.task.cpu.reserved" ],
        [ ".", "ecs.task.cpu.usage.vcpu" ],
        [ ".", "ecs.task.cpu.utilized" ],
        [ ".", "ecs.task.memory.reserved" ],
        [ ".", "ecs.task.memory.usage" ],
        [ ".", "ecs.task.memory.usage.limit" ],
        [ ".", "ecs.task.memory.usage.max" ],
        [ ".", "ecs.task.memory.utilized" ],
        [ ".", "ecs.task.network.rate.rx" ],
        [ ".", "ecs.task.network.rate.tx" ]
    ],
    "region": "eu-central-1"
}

which seems to match the documentation but doesn't help me find how to get the 3 ones I'm missing.

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.