Looking for service level metrics (RunningTaskCount, PendingTaskCount, DesiredTaskCount)

Question

Looking for service level metrics (RunningTaskCount, PendingTaskCount, DesiredTaskCount)

silk-bahamut opened this issue 2 months ago · comments

Question

I'm looking to replace the Container Insight standard configuration in an ECS cluster with a custom one with aws-otel-collector.

I've found many explanation on how to configure all the metrics but my problem is that I wish to find Service level metrics.
Here are all the metrics that I found and all seems to be at tasks level.

In this list, my main interests are:

RunningTaskCount
PendingTaskCount
DesiredTaskCount

is there any way to access these metrics with the aws-otel-collector?

Tests

I tried starting my cluster with the side car defined as below:

{
            "name": "aws-otel-collector",
            "image": "amazon/aws-otel-collector",
            "cpu": 0,
            "portMappings": [],
            "essential": true,
            "command": [
                "{{command}}"
            ],
            "environment": [],
            "mountPoints": [],
            "volumesFrom": [],
            "secrets": [
                {
                    "name": "AOT_CONFIG_CONTENT",
                    "valueFrom": "arn:aws:ssm:eu-central-1:123456789:parameter/bru/adot/config"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "loggroup-adot-test",
                    "awslogs-region": "eu-central-1",
                    "awslogs-stream-prefix": "metrics"
                }
            },
            "healthCheck": {
                "command": [
                    "/healthcheck"
                ],
                "interval": 5,
                "timeout": 6,
                "retries": 5,
                "startPeriod": 1
            },
            "systemControls": []
        },

And I saw many new metrics in cloudwatch:

{
    "view": "timeSeries",
    "stacked": false,
    "metrics": [
        [ "ECS/ContainerInsights", "container.cpu.cores" ],
        [ ".", "container.cpu.onlines" ],
        [ ".", "container.cpu.reserved" ],
        [ ".", "container.cpu.usage.vcpu" ],
        [ ".", "container.cpu.utilized" ],
        [ ".", "container.memory.reserved" ],
        [ ".", "container.memory.usage" ],
        [ ".", "container.memory.usage.limit" ],
        [ ".", "container.memory.usage.max" ],
        [ ".", "container.memory.utilized" ],
        [ ".", "container.network.rate.rx" ],
        [ ".", "container.network.rate.tx" ],
        [ ".", "ecs.task.cpu.cores" ],
        [ ".", "ecs.task.cpu.onlines" ],
        [ ".", "ecs.task.cpu.reserved" ],
        [ ".", "ecs.task.cpu.usage.vcpu" ],
        [ ".", "ecs.task.cpu.utilized" ],
        [ ".", "ecs.task.memory.reserved" ],
        [ ".", "ecs.task.memory.usage" ],
        [ ".", "ecs.task.memory.usage.limit" ],
        [ ".", "ecs.task.memory.usage.max" ],
        [ ".", "ecs.task.memory.utilized" ],
        [ ".", "ecs.task.network.rate.rx" ],
        [ ".", "ecs.task.network.rate.tx" ]
    ],
    "region": "eu-central-1"
}

which seems to match the documentation but doesn't help me find how to get the 3 ones I'm missing.

github-actions · Answer 1 · Mon Sep 30 2024 04:02:13 GMT+0800 (China Standard Time)

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.