Hands-on-workshop-for-Kubernetes-observability

Use cases

In this workship we are going to focus on these main use cases:

Dynamic Service Graph for run-time visualization of flow and policies
Kibana dashboards for quick troubleshooting
Dynamic Packet Capture to analyze traffic on a pod or collection of pods live
Application (L7) observability

Module 0: Observability tools overview

Goal: Explore Calico observability tools.

Calico observability tools

If you are interested in enabling collection of application layer metrics for your workloads, refer to Configure L7 logs documentation to enable application layer metrics collection.

Dashboard

The Dashboard view in the Enterprise Manager UI presents high level overview of what's going on in your cluster. The view shows the following information:
- Connections, Allowed Bytes and Packets
- Denied Bytes and Packets
- Total number of Policies, Endpoints and Nodes
- Summary of CIS benchmarks
- Count of triggered alerts
- Packets by Policy histogram that shows allowed and denied traffic as it is being evaluated by network policies
Policies Board

The Policies Board shows all policies deployed in the cluster and organized into policy tiers. You can control what a user can see and do by configuring Kubernetes RBAC roles which determine what the user can see in this view. You can also use controls to hide away tiers you're not interested in at any given time.

By leveraging stats controls you can toggle additional metrics to be listed for each shown policy.
Audit timeline

The Timeline view shows audit trail of created, deleted, or modified resources.
Endpoints

The Endpoints view lists all endpoints known to Calico. It includes all Kubernetes endpoints, such as Pods, as well as Host endpoints that can represent a Kubernetes host or an external VM or bare metal machine.
Service Graph

The dynamic Service Graph presents network flows from service level perspective. Top level view shows how traffic flows between namespaces as well as external and internal endpoints.
- When you select any node representing a namespace, you will get additional details about the namespace, such as incoming and outgoing traffic, policies evaluating each flow, and DNS metrics.
- When you select any edge, you will get details about the flows representing that edge.
- If you expand a namespace by double-clicking on it, you will get the view of all components of the namespace.
Flow Visualizations

The Flow Visualizations view shows all point-to-point flows in the cluster. It allows you to see the cluster traffic from the network point of view.
Kibana dashboards

The Kibana components comes with Calico commercial offerings and provides you access to raw flow, audit, and dns logs, as well as ability to visualize the collected data in various dashboards.

Some of the default dashboards you get access to are DNS Logs, Flow Logs, Audit Logs, Kuernetes API calls, L7 HTTP metrics, and others.

Prerequisites

Download this repo into your environment:

 git clone https://github.com/JosephYostos/Hands-on-workshop-for-Kubernetes-observability.git

Configure log aggregation and flush intervals.

 kubectl patch felixconfiguration.p default -p '{"spec":{"flowLogsFlushInterval":"10s"}}'
 kubectl patch felixconfiguration.p default -p '{"spec":{"dnsLogsFlushInterval":"10s"}}'
 kubectl patch felixconfiguration.p default -p '{"spec":{"flowLogsFileAggregationKindForAllowed":1}}'

Configure Felix for log data collection.

Felix is one of Calico components that is responsible for configuring routes, ACLs, and anything else required on the host to provide desired connectivity for the endpoints on that host.

 kubectl patch felixconfiguration default --type='merge' -p '{"spec":{"policySyncPathPrefix":"/var/run/nodeagent","l7LogsFileEnabled":true}}'

Module 1: Configuring Environment

Tasks:

security and Kubernetes platform policies should be evaluated before any other policies.
kubernetes platform team require an explicitly allow workloads to connect to kubernetes DNS component.
online boutique microservices application to be deployed in default namespace & dev application to be deployed in dev namespace.

Steps

Deploy policy tiers.

We are going to deploy some policies into policy tier to take advantage of hierarcical policy management.
```
kubectl apply -f demo/tiers/tiers.yaml
```
This will add tiers security and platform to the Calico cluster.
Deploy base policy.

In order to explicitly allow workloads to connect to the Kubernetes DNS component, we are going to implement a policy that controls such traffic.
```
kubectl apply -f demo/10-security-controls/allow-kube-dns.yaml
```

Deploy applications.

# deploy dev app stack
kubectl apply -f demo/dev/app.manifests.yaml

# deploy boutiqueshop app stack
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/master/release/kubernetes-manifests.yaml

Deploy global alerts.

The alerts will be explored in a later lab.

kubectl apply -f demo/50-alerts/globalnetworkset.changed.yaml
kubectl apply -f demo/50-alerts/unsanctioned.dns.access.yaml
kubectl apply -f demo/50-alerts/unsanctioned.lateral.access.yaml

Module 2: Enable L7 Logs

L7 logs capture application interactions from HTTP header data in requests. Data shows what is actually sent in communications between specific pods, providing more specificity than flow logs. (Flow logs capture data only from connections for workload interactions).

Calico collects L7 logs by sending the selected traffic through an Envoy proxy.

Tasks:

Enable L7 logging on frontend deployment in default ns and dev-nginx in dev ns.

Steps

create a Kubernetes pull secret for accessing Calico images, this should be provided by Tigera Team

kubectl create secret generic tigera-pull-secret -n <application pod namespace> --from-file=.dockerconfigjson=<path/to/pull/secret> --type kubernetes.io/dockerconfigjson

Download the patch file to patch-envoy.yaml.

curl https://docs.tigera.io/v3.7/manifests/l7/patch-envoy.yaml -O

Download the Envoy config.

curl https://docs.tigera.io/v3.7/manifests/l7/envoy-config.yaml -O

Create the Envoy config.

kubectl create configmap envoy-config -n <application pod namespace> --from-file=envoy-config.yaml

Configure Felix for log data collection.

kubectl patch felixconfiguration default --type='merge' -p '{"spec":{"policySyncPathPrefix":"/var/run/nodeagent"}}'

Install the envoy log collector

kubectl patch deployment <name of application deployment> -n <namespace> --patch "$(cat patch-envoy.yaml)"

After completeing module 3 some traffic should be generated, go check the "L7 HTTP" dashboard in Kibana also the HTTP logs under the service graph section.

Module 3: Using security controls

Goal: Leverage network policies to segment connections within Kubernetes cluster and prevent known bad actors from accessing the workloads.

Tasks:

Security Team require applying rule to disable all traffic by default.
DevOps team provided their application traffic reqirements and only connections between the components within each namespaces should be allowed as configured by the policies.

Steps

Test connectivity between application components and across application stacks.

a. Test connectivity between workloads within each namespace.

kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://nginx-svc 2>/dev/null | grep -i http'

kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI frontend 2>/dev/null | grep -i http'

kubectl exec -it $(kubectl get po -l app=frontend -ojsonpath='{.items[0].metadata.name}') -c server -- sh -c 'nc -zv productcatalogservice 3550'

b. Test connectivity across namespaces.

kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://frontend.default 2>/dev/null | grep -i http'
kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI http://nginx-svc.dev 2>/dev/null | grep -i http'

c. Test connectivity from each namespace to the Internet.

# test connectivity from dev namespace to the Internet
kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://www.google.com 2>/dev/null | grep -i http'

# test connectivity from default namespace to the Internet
kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI www.google.com 2>/dev/null | grep -i http'

All of these tests should succeed if there are no policies in place to govern the traffic for dev and default namespaces.

Apply staged default-deny policy.

Staged default-deny policy is a good way of catching any traffic that is not explicitly allowed by a policy without explicitly blocking it.
```
kubectl apply -f demo/10-security-controls/staged.default-deny.yaml
```
You should be able to view the potential affect of the staged default-deny policy if you navigate to the Dashboard view in the Enterprise Manager UI and look at the Packets by Policy histogram.
```
# make a request across namespaces and view Packets by Policy histogram
for i in {1..10}; do kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://frontend.default 2>/dev/null | grep -i http'; sleep 2; done
```
The staged policy does not affect the traffic directly but allows you to view the policy impact if it were to be enforced.

Apply network policies to control East-West traffic.

# deploy dev policies
kubectl apply -f demo/dev/policies.yaml

# deploy boutiqueshop policies
kubectl apply -f demo/boutiqueshop/policies.yaml

Now as we have proper policies in place, we can enforce default-deny policy moving closer to zero-trust security approach. You can either enforced the already deployed staged default-deny policy using the Policies Board view in the Enterirpse Manager UI, or you can apply an enforcing default-deny policy manifest.

# apply enforcing default-deny policy manifest
kubectl apply -f demo/10-security-controls/default-deny.yaml
# you can delete staged default-deny policy
kubectl delete -f demo/10-security-controls/staged.default-deny.yaml

Test connectivity with policies in place.

a. The only connections between the components within each namespaces should be allowed as configured by the policies.

# test connectivity within dev namespace
kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://nginx-svc 2>/dev/null | grep -i http'

# test connectivity within default namespace
kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI frontend 2>/dev/null | grep -i http'

b. The connections across dev and default namespaces should be blocked by the global default-deny policy.

# test connectivity from dev namespace to default namespace
kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://frontend.default 2>/dev/null | grep -i http'

# test connectivity from default namespace to dev namespace
kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI http://nginx-svc.dev 2>/dev/null | grep -i http'

c. The connections to the Internet should be blocked by the configured policies.

# test connectivity from dev namespace to the Internet
kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://www.google.com 2>/dev/null | grep -i http'

# test connectivity from default namespace to the Internet
kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI www.google.com 2>/dev/null | grep -i http'

Module 4: Alerts

Goal: Use global alerts to notify security and operations teams about unsanctioned or suspicious activity.

Tasks:

Some alerts have been triggered. Examine these alerts and identify what suspicious activity led to this.

Steps

View triggered alerts.

We implemented alerts in one of the first labs in order to see how our activity can trigger them.

Open Alerts view to see all triggered alerts in the cluster. Review the generated alerts.

You can also review the alerts configuration and templates by navigating to alerts configuration in the top right corner.
Review alerts manifests.

Navigate to demo/50-alerts and review YAML manifests that represent alerts definitions. Each file containes an alert template and alert definition. Alerts templates can be used to quickly create an alert definition in the UI.

Module 5: Using egress access controls & Flow Visualizations

Tasks:

Dev team raised a request to allow traffic between dev and default namspaces as they are developing a new service to integrate with the boutiqueshop app, use flow visualization to identify which pods are trying to talk to each other.
it is also requested to allow centos pod to access api.twilio.com externally.

Steps

Test connectivity within the cluster and to the external endpoint.

a. Test connectivity between dev/centos pod and default/frontend pod.

# test connectivity from dev namespace to default namespace
for i in {1..10}; do kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://frontend.default 2>/dev/null | grep -i http'; sleep 2; done

b. Test connectivity from dev/centos to the external endpoint.

# test connectivity from dev namespace to the Internet
kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://www.google.com 2>/dev/null | grep -i http'

The access should be denied as the policies configured in previous module do not allow it.

Navigate in the flow visualizations chart to find out the source and destinations pods, also you can use policy recommendations to create the new policies.

Implement egress policy to allow egress access from a workload in one namespace, e.g. dev/centos, to a service in another namespace, e.g. default/frontend.

a. Deploy egress policy.
```
kubectl apply -f demo/20-egress-access-controls/centos-to-frontend.yaml
```
b. Test connectivity between dev/centos pod and default/frontend service.
```
kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://frontend.default 2>/dev/null | grep -i http'
```
The access should be allowed once the egress policy is in place.
Implement DNS policy to allow the external endpoint access from a specific workload, e.g. dev/centos.

a. Apply a policy to allow access to api.twilio.com endpoint using DNS rule.
```
# deploy dns policy
kubectl apply -f demo/20-egress-access-controls/dns-policy.yaml

# test egress access to api.twilio.com
kubectl -n dev exec -t centos -- sh -c 'curl -m3 -skI https://api.twilio.com 2>/dev/null | grep -i http'
# test egress access to www.google.com
kubectl -n dev exec -t centos -- sh -c 'curl -m3 -skI https://www.google.com 2>/dev/null | grep -i http'
```
Access to the api.twilio.com endpoint should be allowed by the DNS policy but not to any other external endpoints like www.google.com unless we modify the policy to include that domain name.

b. Edit the policy to use a NetworkSet instead of inline DNS rule.
```
# deploy network set
kubectl apply -f demo/20-egress-access-controls/netset.external-apis.yaml
# deploy DNS policy using the network set
kubectl apply -f demo/20-egress-access-controls/dns-policy.netset.yaml
```
As a bonus example, you can modify the external-apis network set to include *.google.com domain name which would allow access to Google subdomains. If you do it, you can would allow acess to subdomains like www.google.com, docs.google.com, etc.

Module 6: Dynamic packet capture

Goal: Configure packet capture for specific pods and review captured payload.

Steps

Configure packet capture.

Navigate to demo/60-packet-capture and review YAML manifests that represent packet capture definition. Each packet capture is configured by deploing a PacketCapture resource that targets endpoints using selector and labels.

Deploy packet capture definition to capture packets for dev/nginx pods.
```
kubectl apply -f demo/60-packet-capture/nginx-pcap.yaml
```
Once the PacketCapture resource is deployed, Calico starts capturing packets for all endpoints configured in the selector field.

Install calicoctl CLI

The easiest way to retrieve captured *.pcap files is to use calicoctl CLI.

# download and configure calicoctl
curl -o calicoctl -O -L https://docs.tigera.io/download/binaries/v3.7.0/calicoctl
chmod +x calicoctl
sudo mv calicoctl /usr/local/bin/
calicoctl version

Fetch and review captured payload.

The captured *.pcap files are stored on the hosts where pods are running at the time the PacketCapture resource is active.

Retrieve captured *.pcap files and review the content.
```
# get pcap files
calicoctl captured-packets copy dev-capture-nginx --namespace dev

ls dev-nginx*
# view *.pcap content
tcpdump -Xr dev-nginx-XXXXXX.pcap
```
Stop packet capture

Stop packet capture by removing the PacketCapture resource.
```
kubectl delete -f demo/60-packet-capture/nginx-pcap.yaml
```

Congratulations! You have finished all the labs in the workshop.

JosephYostos / Hands-on-workshop-for-Kubernetes-observability

Hands-on-workshop-for-Kubernetes-observability

Use cases

Module 0: Observability tools overview

Calico observability tools

Prerequisites

Module 1: Configuring Environment

Tasks:

Steps

Module 2: Enable L7 Logs

Tasks:

Steps

Module 3: Using security controls

Tasks:

Steps

Module 4: Alerts

Tasks:

Steps

Module 5: Using egress access controls & Flow Visualizations

Tasks:

Steps

Module 6: Dynamic packet capture

Steps

About