sajitsasi / 1ClickECK

1Click Automation of Elastic Cloud on Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

1ClickECK

1ClickECK was built to rapidly deploy with ease an K8s (AWS EKS/Azure AKS/GCP GKE) cluster, install ECK, and the ES Stack.

Total time from configuration to a fully launched ECK cluster generally should take less than 10 minutes. The automation; 1ClickECK, is idempotent.

Note - Automation deploys OpenEBS which exposes and uses locally attached storage. More info, go to the OpenEBS section

1ClickECK currently deploys

  • EKS, AKS, GKE
  • ECK (Optional)
    • ElasticSearch
    • Kibana
    • License loading (Bring your own ES license)

Does not deploy

  • APM Server
  • Fleet
  • Elastic Maps
  • Enterprise Search

2023-01-27-19-39-42.jpg

Prerequisites

Quick Start Details regarding clients required to run the automtaion are outlined in the following sections. However if a ubuntu 20.+ instance is available, a quick start client install script is available here https://github.com/sunileman/1ClickECK/blob/main/scripts/1ClickECK-client-install.sh This will install all the required libaries and CLIs for the automation.

General

AWS

GIT

  • Not required. If interested in pulling the repo via ssh to keep up with updates applied to this automation https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent

Azure

Deployment

2023-02-07-15-28-54.jpg

2023-02-07-15-29-55.jpg

Required Arguments

-c [aws|azure|gcp]

-b [all|k8s]

-d

-r disable openebs

de destroy ECK

i get cluster info

Examples

2023-02-07-15-36-07.jpg

To run the automation in the background. Output will be writen to nohup.out.

  nohup ./1ClickECK.sh -b all -c [aws|gcp|azure] &

Once the automation completes, Kibana endpoints along with username and password should be displayed. To retrieve again, simply run
2023-02-07-15-32-42.jpg

The automation will set your local kubectl manifest. Verify by running

  kubectl get nodes

Tear Down all assets built by the automation

  ./1ClickEckOnEKS.sh -d -c [aws|gcp|azure]

Autoscaling

2023-01-30-11-26-51.jpg

Terraform Variables

The automation requires a few variables to be set in the .[aws|azure]/terraform.tfvars file

Name of your deployment/project
tags.Project
Name of owner
tags.Owner
Name which will be appended to the EKS deployment. Max 6 characters
tags.username

Any variable found in ./[aws|azure]/variables.tf can be set as an envionrment variable.

export TF_VAR_<variable name>="your value"
export TF_VAR_variable_<list variable type>='["value", "value"]'

Envionrment variables and setting variables in terraform.tfvars can be used together. Consider terraform variable order of Precedence: https://developer.hashicorp.com/terraform/language/values/variables#variable-definition-precedence

Note - Instance and pod count
Instance count = Number of K8s nodes per type (hot/warm/etc))
Pod count = Number of pods per type (hot/warm/etc) which will be deployed on the instance type.

For example you can deploy 4 hot pods on 1 hot K8s instance type.

Variables- Default Values

2023-02-07-15-31-29.jpg ./[aws|azure]/variables file host all possible variables supported by this deployment.
If a varialble is not present in ./[aws|azure]/terraform.tfvars, the default value will be taken from ./[aws|azure]/variables.tf If default value is not acceptable, set the varialble value in ./[aws|azure]/terraform.tfvars

For example, the default number of es hot instances is 3. That default value is set in ./[aws|azure]/variables.tf. To change this value to 10; in ./[aws|azure]/terraform.tfvars file, set hot_instance_count = 10.

Another example. The default instance type for warm is im4gn.4xlarge. To change this value to im4gn.8xlarge; in ./[aws|azure]/terraform.tfvars file, set warm_instance_type = im4gn.8xlarge.

Word of caution on instance types for node groups (hot, warm, etc). When selecting an instance type, verify the AMI type: AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64. Instance type must fall withn specific AMI Type.

**Instance types
This deployment only supports DAS, not EBS

Ingest to pods are set by the following variables [master|hot|warm|cold|frozen|ml]_accept_ingest=false

GCP - To skip node pool creation [master|hot|warm|cold|frozen|ml]_create_node_pool=false

Apply ES License

Deployment without a license will use a basic license configuration.
If a ES license is available to you, place it under ./[aws|azure|gcp]/create-eck/license and name the license file es-license.json. The deployment will pick up license file

If a license file needs to be applied or changed after deployment, simply run ./[aws|azure|gpc]/create-eck/eck-add-license.sh

Pod Configuration

Once EKS is deployed, ES pods will be deployed leveraging resources defined in the K8s manifest. Each node type (master, hot, warm, ml, etc) spec is defined within ./[aws|azure]/variables.tf. Change the pod spec to your desired configuration.

For example

ml_pod_count=value
ml_pod_cpu=value
ml_pod_memory=value
ml_pod_storage=value
ml_pod_ES_JAVA_OPTS=value

Latest release of ES, if jvm arguments aren't spplied for heap size, half the available memory within the pod will be used for heap. Take this into consideration if the defaults aren't acceptables

Storage Class

By default ECK will mount openebs storage class to ES pods. To use a different storage class is simple

Run kubeclt get sc to retrieve available storage classes.

GKE Example [master|hot|warm|cold|frozen|ml]_pod_storage_class = "premium-rwo"

AWS Example [master|hot|warm|cold|frozen|ml]_pod_storage_class = "gp3"

Azure Example [master|hot|warm|cold|frozen|ml]_pod_storage_class = "managed-csi-premium"

Run the automation with r option to disable openebs

ECK/ES Updates

The automation; 1ClickECK, is idempotent. Therefore if updates to ECK or ES have been applied, simple rerun 1ClickECK with the same -b -c arguments

kubectl manifest

Automation will set local kube config (kubectl) after automation run. If local kube config needs to be reset, simple rerun the automation (even if there is no change) to set local kube config.

To reset your local kubeclt, run

  /1ClickECK/[gpc|aws|azure]/create-[eks|aks|gke]/setkubectl.sh

OpenEBS / Storage Options

OpenEBS is a automatic disk provisioner for K8s. There may be scanerios the defaults from OpenEBS are not useful. To handle these scanerios, run the automation with -r option to disable openEBS

./1ClickECK -c gcp -b k8s -r

If openEBS exist on K8s cluster and to remove run this

./1ClickECK/<aws|gcp|azure>/create-gke/addons/openebs/1ClickRemoveOpenEBS.sh

If additional storage classes are available and need to be leverage, set the following variables with the storage class name

[master|hot|warm|cold|frozen|ml]_pod_storage_class = <Storage Class Name>

AWS GPE The automation installs EBS CSI along with required IAM permissions. Additionally the automation will create a stroage calls gp3. It can be used by setting

[master|hot|warm|cold|frozen|ml]_pod_storage_class = gp3

Troubleshooting

Azure AKS bash into pod https://learn.microsoft.com/en-us/azure/aks/node-access

GCP GKE bash into pod https://cloud.google.com/migrate/containers/docs/troubleshooting/executing-shell-commands

OOMKilled

Pod JVM is requesting more memory than pod limits

exec plugin: invalid apiversion "client.authentication.k8s.io/v1alpha1"

The version of K8s does not match kubectl client. Please refer to: https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html

All automation logs are stored in ./logs

Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

try: export KUBE_CONFIG_PATH=~/.kube/config

If one of the pods is stuck in init stage run

kubectl describe pod <podname>

That should give you an idea on what went wrong. If the mount fails due to srv/local not available, rerun openEBS.

How to enter pod within GKE nodes

gcloud compute ssh <NODE_NAME> --zone <ZONE>

-Logs showing markup Use lnav to view logs: https://lnav.org/

Azure Error Error: building AzureRM Client: Authenticating using the Azure CLI is only supported as a User (not a Service Principal). Set your azure creds prior to launching the automation

-Connection refused when trying to reach ES API port 9200 Verify at least 1 pod has role which does inclue master. The load balancer deployed selects nodes based on elasticsearch.k8s.elastic.co/node-master: "false"

Error

│ Error: Failed to query available provider packages
│ 
│ Could not retrieve the list of available versions for provider
│ anschoewe/curl: could not connect to registry.terraform.io: Failed to
│ request discovery document: Get
│ "https://registry.terraform.io/.well-known/terraform.json": read tcp
│ xxxx:54284->xxxx:443: read: connection reset by peer

Rerun automation. Terraform api (target side) was reset.

Error

Try: aws creds are invalid. Verify by running aws sts get-caller-identity

About

1Click Automation of Elastic Cloud on Kubernetes


Languages

Language:HCL 71.2%Language:Shell 28.8%