Step by Step Full DevOps Project: A Kubernetes Cluster With High Availability, Reliability, Auto Scaling, Auto-Healing, and Monitoring.
Creating a Kubernetes Cluster with High Availability, Reliability, Auto Scaling, Auto-Healing, and Monitoring on The Amazon EKS via Terrafom or Cloudformation. In order to do these, we'll use GitOps Workflow (ArgoCD), Jenkins, Rancher, Amazon Elastic Kubernetes Service (EKS), VPC (with both public and private subnets) for Amazon EKS, Amazon RDS MySQL Database, S3 Bucket, Amazon ECR, AWS Secrets Manager, Amazon Route53, AWS Certificate Manager, Let's Encrypt-Cert Manager, CloudWatch, Prometheus and Grafana. We will do these practically step by step in this Readme.
If you liked the article, I would be happy if you click on the Medium Following button to encourage me to write and not miss future articles.
Your clap, follow, or subscribe, they help my articles to reach the broader audience. Thank you in advance for them.
1.1.1. Creating VPC for Amazon EKS by Using Cloudformation
1.1.2. Creating VPC for Amazon EKS by Using Terraform
1.2.1. Firstly, creating a role for Amazon EKS
1.2.2. Creating a Kubernetes cluster in Amazon EKS via eksctl
1.2.3. Creating a Kubernetes cluster in Amazon EKS via Rancher
1.2.4. Creating a Kubernetes cluster in Amazon EKS via Terrafom
1.3. Deploying an application that consists of 12 Microservices into the Kubernetes cluster.
1.3.1. Creating RDS MySQL Database
1.3.2. Implementing AWS Secrets Manager to securely manage sensitive information.
1.3.3. Using AWS Secrets Manager's secret in RDS MySQL Database.
1.3.4. Setting RDS MySQL Database for High Availability and Reliability.
-
Deployment options: Multi-AZ DB instance
-
Instance and Storage configuration
-
Storage Auto Scaling
-
Automated backups
-
To see The Amazon CloudWatch Logs
-
Creating an ElastiCache cluster from RDS for read performance.
1.3.5. Connecting RDS MySQL Database to microservice application by modifying mysql-server-service.yaml file
- Testing DB connectivity
1.4.1. Deploying Microservices app to EKS cluster via Jenkins Pipelines
1.4.2. DNS name
1.4.3. Running Deployment, Services, and Ingress
1.4.4. Controlling Microservices application via The Internet Browser
1.4.5. SSL/TLS Certificate via Let's Encrypt and Cert Manager
1.5. Controlling and Modifying the Amazon EKS cluster via Rancher
Implementing a GitOps workflow using ArgoCD for managing the deployment of applications in the Kubernetes cluster.
2.2. TLS Certificate for ArgoCD via AWS Certificate Manager (ACM)
2.3. DNS Name for ArgoCD via Amazon Route53
2.4. Launching ArgoCD
2.5. Create a Git repository to store Kubernetes manifests for your sample application.
2.6.1. Connecting The Microservice Repositories
2.6.2. Creating a new app in ArgooCD for Microservice Applications
2.6.3. Observing the operation and synchronization of the Microserviceapplication on ArgoCD
4.1.1. Creating the HorizontalPodAutoscaler
4.1.2. Installing Metric Server to The Cluster.
4.2. Configuring the Kubernetes cluster for automatic scaling based on resource utilization.
4.2.1. Deploying Cluster Autoscaler
4.2.2. Configuring Dynamic Scaling Policies via AWS Console
4.3. Multi-AZ Kubernetes Cluster for Reliability
4.4. High Availability, Auto Scaling, Auto-Healing, and Failover for RDS DataBase
Integrating AWS CloudWatch for monitoring and logging of the Kubernetes cluster and the deployed application. Setting up alerts for critical events or performance thresholds.
5.1.1. Enabling Control plane logging
5.1.2. Viewing cluster control plane logs
5.1.3. Setting up Container Insights on Amazon EKS and Kubernetes
5.1.3.1. To attach the necessary policy to the IAM role for your worker nodes.
5.1.3.2. To deploy Container Insights using the quick start.
5.1.4.1. Create a namespace for CloudWatch
5.1.4.2. Create a service account in the cluster
5.1.4.3. Create and Edit a ConfigMap for the CloudWatch agent
5.1.4.4. Deploy the CloudWatch agent as a DaemonSet
5.1.5. Creating an alert via Cloudwatch
5.2. Creating an alert via Prometheus and Grafana
5.2.1. Deploying Prometheus
5.2.2. Deploying Grafana
5.2.3. Setting Up An Alarm By Using the Grafana and Prometheus
5.3. Alarm for The Nodes of Auto Scaling Groups.
For installation with terraform
- The yaml installation file for Cloudformation is available in this GitHub repo. You can download the CloudFormation template from the GitHub repo.
- Note: AWS recommends installing with AWS Cloudformation.
- For a more detailed explanation, you can review this documantation's link.
Or Also, If you want, you can create a VPC for EKS with AWS Cloudformation using "Cloudformation's Amazon S3 URL to create VPC for Amazon EKS (AWS CloudFormation template)" instead of using "Cloudformation yaml installation file". We can create a VPC that supports only IPv4, or a VPC that supports IPv4 and IPv6. To do this, paste one of the following URLs into the text area under "Amazon S3 URL" and choose "Next", as shown in the pictures below.
for IPv4:
https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/amazon-eks-vpc-private-subnets.yaml
for IPv4 and IPv6:
https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/amazon-eks-ipv6-vpc-public-private-subnets.yaml
- The terraform-vpc installation's
.tf files
are available in this GitHub repo. The files that create the Amazon EKS cluster also create a VPC for Amazon EKS. - For a more detailed explanation, you can review this documantation's link.
terraform init
terraform apply -auto-approve
1.2. Deploying a Kubernetes cluster in AWS using EKS (Elastic Kubernetes Service) in the private subnet.
- Kubernetes clusters managed by Amazon EKS use this role to manage nodes and the legacy Cloud Provider uses this role to create load balancers with Elastic Load Balancing for services. Before you create Amazon EKS clusters, you must create an IAM role with either of the following IAM policies.
- For a more detailed explanation, you can review this documantation's link.
Use cumhur-cluster.yaml
file in this repository. Don't forget to replace private subnets ids
with yours.
For a more detailed explanation, you can review; my article in the link
For a more detailed explanation, you can review my article in the link, Working with Microservices-6: Creating the Rancher server, Running Rancher in it, and Preparing Rancher to use in Jenkins Pipeline and Working with Microservices-8: Preparing the staging pipeline in Jenkins, and deploying the microservices app to the Kubernetes cluster using Rancher, Helm, Maven, Amazon ECR, and Amazon S3. Part-1
-
The terraform-eks installation .tf files are available in the GitHub repo.
-
Also, the "terraform-eks" files will create a VPC for the AWS EKS cluster.
terraform init
terraform apply -var="cluster_name=eks-cumhur-cluster" -auto-approve
For a more detailed explanation, you can review my article in the link, Working with Microservices-14: Creating Amazon RDS MySQL(8.0.31) database for the Kubernetes cluster in the Production stage.
-
Deployment options: Multi-AZ DB instance for High Availability and failover.
-
Creating an ElastiCache cluster from RDS for read performance. In order to save up to 55% in cost and gain up to 80x faster read performance using ElastiCache with RDS for MySQL.
1.3.5. Connecting RDS MySQL Database to microservice application by modifying mysql-server-service.yaml file
1.4. Deploying an application consisting of the 12 Microservices to the Amazon EKS Kubernetes cluster.
For a more detailed explanation, you can review my article in the link, Working with Microservices-9: Preparing the staging pipeline in Jenkins, and deploying the microservices app to the Kubernetes cluster using Rancher, Helm, Maven, Amazon ECR, and Amazon S3. Part-2
For a more detailed explanation, you can review my article in the link, Working with Microservices-12: Setting Domain Name and TLS certificate for Production Pipeline using Route 53, Let’s Encrypt and Cert Manager
- For a more detailed explanation, you can review my article in the link, Argo CD-1: Understanding, Installing, and Using Argo CD as a GitOps Continuous Delivery Tool and Argo CD and GitHub Action-1: Running Together Them To Create The CI/CD Pipeline
Implement a GitOps workflow using ArgoCD for managing the deployment of applications in the Kubernetes cluster.
- For a more detailed explanation, you can review my article in the link, Argo CD-1: Understanding, Installing, and Using Argo CD as a GitOps Continuous Delivery Tool and Argo CD and GitHub Action-1: Running Together Them To Create The CI/CD Pipeline
2.6. Configure the GitOps tool to continuously synchronize the state of the cluster with the desired state specified in the Git repository.
Integrate an Amazon RDS instance for database storage and Implement AWS Secrets Manager to securely manage sensitive information (e.g., database credentials) used by the application.
NOTE: For the application to work properly, it must work with the Database, so this section was done in items 1.2. and 1.3.
4.1. Implementing Horizontal Pod Autoscaling (HPA) for one or more components of your sample application.
For a more detailed explanation, you can review my article in the link, Diving into Kubernetes-1: Creating and Testing a Horizontal Pod Autoscaling (HPA) in Kubernetes Cluster
- For a more detailed explanation, you can review this documantation's link.
It provides high availability by distributing the EKS cluster across multiple AWS Availability Zones (AZs), thus it increases "Reliability".
- Multi-AZ cluster for High Availability and failover. To get high availability and enhance availability during planned system maintenance, and help protect databases against DB instance failure and Availability Zone disruption.
- Point time recovery for AutoBackup,
- ElastiCache cluster (to save up to 55% in cost and gain up to 80x faster read performance using ElastiCache with RDS for MySQL).
- Enabled Storage Auto Scaling
- Read replica to increase the scalability for high performance. I did not implement this, However, I could use Read Replicas with Multi-AZ as part of a disaster recovery (DR) strategy for my production RDS database. Also, Read Replicas helps in decreasing load on the primary DB by serving read-only traffic.
Thus, it helps reduce the load on the originating server (the web server from which CloudFront retrieves the content) and improves content delivery performance. It also improves the usability of our website, providing higher usability.
We also protect against Distributed Denial of Service (DDoS) attacks that affect the availability of a website.
5.1. Integrate AWS CloudWatch for monitoring and logging of the Kubernetes cluster and the deployed application.
I can enable CloudWatch Observability in our clusters through the CloudWatch Observability add-on. After my cluster is created, navigate to the add-ons tab and install the CloudWatch Observability add-on to enable CloudWatch Application Signals and Container Insights and start ingesting telemetry into CloudWatch.
eksctl utils update-cluster-logging --enable-types={ all } --region=us-east-1 --cluster=cumhur-eks-cluster
- Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
- Select one of the worker node instances and choose the IAM role in the description.
- On the IAM role page, choose Attach policies.
- In the list of policies, select the check box next to CloudWatchAgentServerPolicy. If necessary, use the search box to find this policy.
- Choose Attach policies.
ClusterName=cumhur-eks-cluster
RegionName=us-east-1
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
[[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
[[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart-enhanced.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/;s/{{region_name}}/'${RegionName}'/;s/{{http_server_toggle}}/"'${FluentBitHttpServer}'"/;s/{{http_server_port}}/"'${FluentBitHttpPort}'"/;s/{{read_from_head}}/"'${FluentBitReadFromHead}'"/;s/{{read_from_tail}}/"'${FluentBitReadFromTail}'"/' | kubectl apply -f -
5.1.4.Set up the CloudWatch agent to collect cluster metrics (Set up alerts for critical events or performance thresholds.)
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-serviceaccount.yaml
curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-configmap.yaml
kubectl apply -f cwagent-configmap.yaml
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-daemonset.yaml
kubectl create namespace prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="gp2" \
--set server.persistentVolume.storageClass="gp2"
kubectl create namespace grafana
helm install grafana grafana/grafana \
--namespace grafana \
--set persistence.storageClassName="gp2" \
--set persistence.enabled=true \
--set adminPassword='Cumhur1234.?' \
--values ${HOME}/environment/grafana/grafana.yaml \
--set service.type=LoadBalancer
For a more detailed explanation, you can review my article in the link, Working with Microservices-18: Setting Up An Alarm By Using the Grafana Dashboard and Prometheus ConfigMap.yml
For installation with terraform:
terraform destroy --auto-approve