Whale, Hello There!
This document contains steps on how to bring up the environment. For a discussion on the architecture and design decisions, please see design.pdf
Provision the Infrastructure
Set up your AWS client
First, ensure that you've configured your AWS CLI accordingly. Setting that up is outside the scope of this guide so please go ahead and read up at https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
Install Terraform
Grab the latest Terraform CLI here
Install kubectl
Grab it via this guide
Install eksctl
Grab it via this guide
Install Helm
Grab it via this guide
Initialize the Terraform Working Directory
terraform -chdir=terraform init
Create Your Environment-Specific tfvars File
cp terraform/example.tfvars terraform/terraform.tfvars
Then modify the file as you see fit.
Create the DB Credentials Secret in AWS
whale_aws_cli_profile=$(grep -E ' *profile *=' terraform/terraform.tfvars | sed -E 's/ *profile *= *"(.*)"/\1/g')
whale_aws_region=$(grep -E ' *region *=' terraform/terraform.tfvars | sed -E 's/ *region *= *"(.*)"/\1/g')
whale_env_name=$(grep -E ' *env_name *=' terraform/terraform.tfvars | sed -E 's/ *env_name *= *"(.*)"/\1/g')
whale_db_creds_secret_name=$(grep ' *db_creds_secret_name *=' terraform/terraform.tfvars | sed -E 's/ *db_creds_secret_name *= *"(.*)"/\1/g')
whale_secret_file=~/.whale/secrets/db_creds-${whale_env_name}.json
mkdir -p ~/.whale/secrets
chmod 0700 ~/.whale/secrets
cat > $whale_secret_file <<EOF
{
"db_user": "SU_$(uuidgen | tr -d '-')",
"db_pass": "$(uuidgen)"
}
EOF
chmod 0600 $whale_secret_file
aws secretsmanager create-secret \
--profile "$whale_aws_cli_profile" \
--name "$whale_db_creds_secret_name" \
--description "Whale DB credentials for ${whale_env_name} environment" \
--secret-string file://$whale_secret_file
Create a Route 53 Zone for Your Environment
First, get a hold of an FQDN that you own and define it in an env var:
whale_zone_fqdn=<TYPE-IN-YOUR-FQDN-HERE>
Let's also create a unique caller reference:
whale_route53_caller_reference=$(uuidgen | tr -d '-')
Then, create the zone:
whale_aws_cli_profile=$(grep -E ' *profile *=' terraform/terraform.tfvars | sed -E 's/ *profile *= *"(.*)"/\1/g')
whale_aws_region=$(grep -E ' *region *=' terraform/terraform.tfvars | sed -E 's/ *region *= *"(.*)"/\1/g')
aws route53 create-hosted-zone \
--profile "$whale_aws_cli_profile" \
--name "$whale_zone_fqdn" \
--caller-reference "$whale_route53_caller_reference" > tmp/create-hosted-zone.out
List the nameservers for your zone:
cat tmp/create-hosted-zone.out | jq -r '.DelegationSet.NameServers[]'
Now modify your DNS servers to use the hosts listed above.
And We're Off!
terraform -chdir=terraform apply
Proceed once the above is done.
(Optional) Connect to the Bastion for the First Time
Use ssh4realz to ensure you connect to the bastion securely. For a guide on how to (and why) use the script, see this video.
ssh4realz $(terraform -chdir=terraform output -raw bastion1_instance_id)
Subsequent Bastion SSH Connections
With the bastion's host key already saved to your known_hosts file, just SSH directly to its public ip.
ssh -A ubuntu@$(terraform -chdir=terraform output -raw bastion1_public_ip)
Set-up Your kubectl Config File
Back in your local machine
aws eks --region=$(terraform -chdir=terraform output -raw region) \
update-kubeconfig \
--name $(terraform -chdir=terraform output -raw k8s_cluster_name)
kubectl config use-context $(terraform -chdir=terraform output -raw k8s_cluster_arn)
chmod 0600 ~/.kube/config
Check that you're able to connect to the kube-api-server:
kubectl get pods --all-namespaces
Sanity Check: Double-check that Pods Can Reach the DB
# Print out the DB endpoint for reference
terraform -chdir=terraform output db_endpoint
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh
Once in the prompt, run:
/ # telnet <HOSTNAME-PORTION-OF-db_endpoint-OUTPUT> <PORT-PORTION-OF-db_endpoint-OUTPUT>
It should output:
Connected to <HOSTNAME>
To exit:
<Press Ctrl-] then Enter then e>
/ # exit
Log in to the UI and API Container Registries
aws ecr get-login-password --region $(terraform -chdir=terraform output -raw region) | \
docker login --username AWS --password-stdin $(terraform -chdir=terraform output -raw registry_ui)
aws ecr get-login-password --region $(terraform -chdir=terraform output -raw region) | \
docker login --username AWS --password-stdin $(terraform -chdir=terraform output -raw registry_api)
Deploy Prometheus
For this section, we will follow this AWS guide:
kubectl create namespace prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm upgrade -i prometheus prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2"
Watch for the status of each prometheus pod via:
watch -d kubectl get pods -n prometheus
Once all of them are up, temporarily set up port forwarding to access the Prometheus UI:
kubectl --namespace=prometheus port-forward deploy/prometheus-server 9090
Browse to http://localhost:9090
When you're done, hit Ctrl-C to stop the port forwarding.
Ensure Your Cluster Has an OpenID Connect Provider
OIDC will be used by some pods in the cluster to connect to the AWS API. This section will be based off of this guide
First check if the cluster already has an OIDC provider:
aws eks describe-cluster \
--region $(terraform -chdir=terraform output -raw region) \
--name $(terraform -chdir=terraform output -raw k8s_cluster_name) \
--query "cluster.identity.oidc.issuer" \
--output text
It should return something like:
https://oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E
Now grep that sample ID from your list of OIDC providers:
aws iam list-open-id-connect-providers | grep <EXAMPLED539D4633E53DE1B716D3041E>
If the above command returned an ARN, you're done with this section. If it did not return one, then run:
eksctl utils associate-iam-oidc-provider \
--region $(terraform -chdir=terraform output -raw region) \
--cluster $(terraform -chdir=terraform output -raw k8s_cluster_name) \
--approve
Rerun the aws iam command above again (including the pipe to grep) to double check.
Install cert-manager
kubectl apply --validate=false -f cert-manager/cert-manager.yaml
Watch for the status of each pod via:
watch -d kubectl get pods -n cert-manager
Install the Load Balancer Controller
We will base the following steps on this guide
aws iam create-policy \
--policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file://aws-lb-controller/iam-policy.json | \
tee tmp/iam-policy.out
whale_aws_account_id=$(terraform -chdir=terraform output -raw account_id)
whale_k8s_cluster_name=$(terraform -chdir=terraform output -raw k8s_cluster_name)
eksctl create iamserviceaccount \
--cluster="$whale_k8s_cluster_name" \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--attach-policy-arn=arn:aws:iam::${whale_aws_account_id}:policy/AWSLoadBalancerControllerIAMPolicy \
--override-existing-serviceaccounts \
--approve
cat aws-lb-controller/load-balancer.yaml | \
sed 's@--cluster-name=WHALE_CLUSTER_NAME@'"--cluster-name=${whale_k8s_cluster_name}"'@' | \
kubectl apply -f -
Watch the controller pod go up via:
watch -d kubectl get pods -n kube-system
Prepare the App's Namespace
kubectl create ns whale
Add the DB Credentials as a Secret
whale_env_name=$(terraform -chdir=terraform output -raw env_name)
kubectl create secret generic postgres-credentials -n whale --from-env-file <(jq -r "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]" ~/.whale/secrets/db_creds-${whale_env_name}.json)
Create the Cluster Issuer for Whale
The following steps are based off of this guide, and this bit of a (working) hack:
Next, let's deploy the cluster issuer in another terminal:
whale_dns_zone=<TYPE-IN-YOUR-FQDN-HERE>
whale_env_name=$(terraform -chdir=terraform output -raw env_name)
whale_region=$(terraform -chdir=terraform output -raw region)
whale_hosted_zone_id=$(aws route53 list-hosted-zones | \
jq -r ".HostedZones[] | select(.name=\"${whale_dns_zone}\") | .Id" | \
rev | cut -d '/' -f 1 | rev)
whale_cert_manager_role_arn=$(terraform -chdir=terraform output -raw cert_manager_role_arn)
cat cert-manager/cluster-issuer.yaml | \
sed 's@WHALE_DNS_ZONE@'"${whale_dns_zone}"'@' | \
sed 's@WHALE_ENV_NAME@'"${whale_env_name}"'@' | \
sed 's@WHALE_REGION@'"${whale_region}"'@' | \
sed 's@WHALE_HOSTED_ZONE_ID@'"${whale_hosted_zone_id}"'@' | \
sed 's@WHALE_CERT_MANAGER_ROLE_ARN@'"${whale_cert_manager_role_arn}"'@' | \
kubectl apply -f -
Check that it created the secret for our app:
kubectl get secret ${whale_env_name}-tls -n cert-manager
Wait for App Events
First, lets follow events in the whale namespace to know what's happening when we apply our manifest later:
kubectl get events -n whale -w
Build and Deploy the UI
make ui
In the other terminal session where you're watching events, wait for this line:
0s Normal CertificateIssued certificaterequest/whale-prod-tls-<pod-suffix> Certificate fetched from issuer successfully
Be patient though as it can take a few minutes and you'll see errors like this:
Error presenting challenge: Time limit exceeded. Last error:
or:
Failed build model due to ingress: whale/ingress-whale-api: none certificate found for host: ui.whale.kubekit.io
Ignore those. Check the status as well via:
https://check-your-website.server-daten.de/?q=${component}.${whale_dns_zone}
Import the Key and Cert to ACM and Add the UI FQDN to Route53
scripts/configure-tls-resources ui <DNS_ZONE-FQDN-HERE>
Once this script completes, the AWS LB Controller will be able to create the ALB fronting the UI.
Build and Deploy the API
make api
In the other terminal session where you're watching events, wait for this line:
0s Normal CertificateIssued certificaterequest/whale-prod-api-tls-<pod-suffix> Certificate fetched from issuer successfully
Be patient though as it can take a few minutes and you'll see errors like this:
Error presenting challenge: Time limit exceeded. Last error:
or:
Failed build model due to ingress: whale/ingress-whale-api: none certificate found for host: api.whale.kubekit.io
Ignore those. Check the status as well via:
https://check-your-website.server-daten.de/?q=${component}.${whale_dns_zone}
Import the Key and Cert to ACM and Add the API FQDN to Route53
scripts/configure-tls-resources api <DNS_ZONE-FQDN-HERE>
Once this script completes, the AWS LB Controller will be able to create the ALB fronting the API.
Clean Up That Blubber!
whale_env_name=$(terraform -chdir=terraform output -raw env_name)
whale_k8s_cluster_name=$(terraform -chdir=terraform output -raw k8s_cluster_name)
whale_aws_account_id=$(terraform -chdir=terraform output -raw account_id)
scripts/delete-tls-resources ui <DOMAIN-FQDN-HERE>
kubectl delete ns whale
kubectl delete -f aws-lb-controller/load-balancer.yaml
kubectl delete ns cert-manager
kubectl delete ns prometheus
eksctl delete iamserviceaccount \
--cluster=$whale_k8s_cluster_name \
--namespace=kube-system \
--name=aws-load-balancer-controller
cat tmp/iam-policy.out | \
jq -r '.Policy.Arn' | \
xargs -I {} aws iam delete-policy --policy-arn {}
terraform -chdir=terraform destroy
Finally, if you no longer plan on bringing up this cluster at a later point in time, clean up the following as well:
aws secretsmanager delete-secret \
--force-delete-without-recovery \
--secret-id "whale-db-creds-${whale_env_name}"
whale_zone_fqdn=<TYPE-IN-YOUR-FQDN-HERE>
whale_aws_cli_profile=$(grep -E ' *profile *=' terraform/terraform.tfvars | sed -E 's/ *profile *= *"(.*)"/\1/g')
whale_aws_region=$(grep -E ' *region *=' terraform/terraform.tfvars | sed -E 's/ *region *= *"(.*)"/\1/g')
aws route53 delete-hosted-zone \
--profile "$whale_aws_cli_profile" \
--name "$whale_zone_fqdn" \
--caller-reference "$whale_route53_caller_reference" > tmp/create-hosted-zone.out