[EKS] Enable HPA with CloudWatch metrics and alarms

Question

[EKS] Enable HPA with CloudWatch metrics and alarms

joshuabaird opened this issue 6 years ago · comments

Tell us about your request
Many ECS customers make use of ECS service autoscaling based on CloudWatch metrics and alarms. This functionality is desired in EKS.

Community projects that add this support include https://github.com/chankh/k8s-cloudwatch-adapter

Which service(s) is this request for?
This could be EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
We need to be able to use HPA to autoscale pods based on Cloudwatch metrics and alarms.

Are you currently working around this issue?
We use ECS.

Nate Taber · Answer 1 · Thu Mar 14 2019 13:20:08 GMT+0800 (China Standard Time)

You can do this today in EKS by using custom metrics with the Kubernetes metrics server (part of the HPA implementation). These are typically metrics from within the Kubernetes cluster however and preform a similar function to how CW metrics and alarms work for ECS. I think what you are describing would be to consume CW metrics as external metrics into the metrics server and use these to trigger scaling.

Can you describe the use case that having this feature would enable?

Aaron Roydhouse · Answer 2 · Thu Mar 14 2019 22:05:28 GMT+0800 (China Standard Time)

k8s/EKS has native support for autoscaling both application container replicas and cluster worker nodes based on cluster's built-in metrics collection. You don't need to integrate external systems like CloudWatch.

However, if you did still want to scale off CloudWatch, e.g. maybe you are triggering scaling inside EKS in response to SQS queue length, then you could trigger a Lambda that tells the EKS k8s API to scale up down your application (like kubectl scale from the CLI).

Aaron Roydhouse · Answer 3 · Thu Mar 14 2019 22:15:41 GMT+0800 (China Standard Time)

That project mentioned @joshuabaird has disappeared. If you want to go the other direction, export internal cluster metrics to CloudWatch. then Istio has an adaptor for export traffic metrics and this issue #38 might be relevant.

Josh Baird · Answer 4 · Thu Mar 14 2019 22:21:05 GMT+0800 (China Standard Time)

Sorry - my point was that users migrating from ECS are most likely using CloudWatch metrics to autoscale their ECS services, so this same functionality is desired in EKS. It's an important feature to consider seeing that CloudWatch is a fundamental service in AWS, in my opinion.

Use-case:

We currently push custom metrics to CloudWatch for various things. These metrics are used to autoscale our ECS services. So, ideally, we need similar functionality in EKS otherwise we would need to re-write our entire metrics pipeline for this use case.

Aaron Roydhouse · Answer 5 · Thu Mar 14 2019 23:21:08 GMT+0800 (China Standard Time)

Cool @joshuabaird, if you already have CloudWatch metrics then right now you can trigger:

Cluster scaling just by scaling the worker node ASG as normal, new nodes will automatically join the cluster, workload on deleted nodes gets automatically rescheduled
Service scaling using the CloudWatch -> Lambda -> EKS k8s API approach

The k8s built-in horizontal autoscaler also supports custom metrics, so someone or AWS could implement a CloudWatch metrics adaptor. Then the native k8s service scaling would be driven off CloudWatch metrics. There are existing projects for scaling based on Azure, Google, Datadog and Prometheus metrics. Adding a CloudWatch metrics adaptor seems like a good way to add what you want?

Josh Baird · Answer 6 · Thu Mar 14 2019 23:23:20 GMT+0800 (China Standard Time)

Yep, adding a Cloudwatch metrics adapter sounds like a solution!

KH Chan · Answer 7 · Mon Mar 25 2019 17:34:51 GMT+0800 (China Standard Time)

@whereisaaron @joshuabaird the project has been moved to https://github.com/awslabs/k8s-cloudwatch-adapter

Kohei Ota (inductor) · Answer 8 · Mon Aug 05 2019 13:00:37 GMT+0800 (China Standard Time)

While I understand this is possible by using the described workaround, I believe HPA is a standard feature for Kubernetes and as for the metrics-server can be a default toolset for a managed Kubernetes cluster.

I'd really like this feature to come true without setting much!

Charles Parasa · Answer 9 · Mon Aug 10 2020 12:11:07 GMT+0800 (China Standard Time)

How can I push some of my Application Custom Metrics to Cloudwatch and use them in my HPA

Srinivas Devaki · Answer 10 · Sun Apr 18 2021 17:37:11 GMT+0800 (China Standard Time)

while the k8s cloudwatch adapter & eks container insights is great, it's still a major cluster level component if it stopped working for any reason then all autoscaling of all deployments stop, which is a significant risk.

While in ECS entire autoscaling is completely managed without any in-house management. this is one of the many reliability issues of managed aws managed k8s platform where it's not truly aws managed, there is a significant chunk of cluster-level components that need to be managed in-house. AWS managing such basic components will greatly increase the adoption of EKS from our team due to increased reliability.

Matthew Pettifer · Answer 11 · Wed Aug 25 2021 23:16:35 GMT+0800 (China Standard Time)

Is there any updates on this? I have just spoke to AWS Support and they indicate that the k8s-cloudwartch-adapter is unsupported now. When can we expect a way to natively scale services based on cloudwatch metrics without have to do a run around for it?

Filippo Balicchia · Answer 12 · Mon Aug 30 2021 15:28:45 GMT+0800 (China Standard Time)

Hi k8s-cloudwartch-adapter seems to be archived but I can't find in favor of whom.
Could you point me please to an alternative ?

sudip-moengage · Answer 13 · Tue Aug 31 2021 04:56:54 GMT+0800 (China Standard Time)

@fbalicchia use keda.sh

Srinivas Devaki · Answer 14 · Thu Mar 09 2023 17:56:07 GMT+0800 (China Standard Time)

cloudwatch metrics adapter doesn't seem like a good solution to use. major issue is that it uses GetMetricData api, which has rate limits, so if someone ran a script to fetch metrics for a usecase, suddenly you might see your autoscaling stopped working or worse started scaling down

the adapter either needs to maintain some state by itself or aws should expand the HPA to use an event driven approach like in ECS where alarm triggers an autoscaling action like KEDA