JosephYostos / Hands-on-workshop-for-Kubernetes-observability

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hands-on-workshop-for-Kubernetes-observability

Use cases

In this workship we are going to focus on these main use cases:

  • Dynamic Service Graph for run-time visualization of flow and policies
  • Kibana dashboards for quick troubleshooting
  • Dynamic Packet Capture to analyze traffic on a pod or collection of pods live
  • Application (L7) observability

Module 0: Observability tools overview

Goal: Explore Calico observability tools.

Calico observability tools

If you are interested in enabling collection of application layer metrics for your workloads, refer to Configure L7 logs documentation to enable application layer metrics collection.

  1. Dashboard

    The Dashboard view in the Enterprise Manager UI presents high level overview of what's going on in your cluster. The view shows the following information:

    • Connections, Allowed Bytes and Packets
    • Denied Bytes and Packets
    • Total number of Policies, Endpoints and Nodes
    • Summary of CIS benchmarks
    • Count of triggered alerts
    • Packets by Policy histogram that shows allowed and denied traffic as it is being evaluated by network policies

    dashboard view

  2. Policies Board

    The Policies Board shows all policies deployed in the cluster and organized into policy tiers. You can control what a user can see and do by configuring Kubernetes RBAC roles which determine what the user can see in this view. You can also use controls to hide away tiers you're not interested in at any given time.

    policies board

    By leveraging stats controls you can toggle additional metrics to be listed for each shown policy.

    policies board stats

  3. Audit timeline

    The Timeline view shows audit trail of created, deleted, or modified resources.

    timeline view

  4. Endpoints

    The Endpoints view lists all endpoints known to Calico. It includes all Kubernetes endpoints, such as Pods, as well as Host endpoints that can represent a Kubernetes host or an external VM or bare metal machine.

    endpoints view

  5. Service Graph

    The dynamic Service Graph presents network flows from service level perspective. Top level view shows how traffic flows between namespaces as well as external and internal endpoints.

    service graph node view

    • When you select any node representing a namespace, you will get additional details about the namespace, such as incoming and outgoing traffic, policies evaluating each flow, and DNS metrics.
    • When you select any edge, you will get details about the flows representing that edge.
    • If you expand a namespace by double-clicking on it, you will get the view of all components of the namespace.
  6. Flow Visualizations

    The Flow Visualizations view shows all point-to-point flows in the cluster. It allows you to see the cluster traffic from the network point of view.

    flow viz view

  7. Kibana dashboards

    The Kibana components comes with Calico commercial offerings and provides you access to raw flow, audit, and dns logs, as well as ability to visualize the collected data in various dashboards.

    kibana flows

Some of the default dashboards you get access to are DNS Logs, Flow Logs, Audit Logs, Kuernetes API calls, L7 HTTP metrics, and others.

Prerequisites

  1. Download this repo into your environment:
 git clone https://github.com/JosephYostos/Hands-on-workshop-for-Kubernetes-observability.git  
  1. Configure log aggregation and flush intervals.
 kubectl patch felixconfiguration.p default -p '{"spec":{"flowLogsFlushInterval":"10s"}}'
 kubectl patch felixconfiguration.p default -p '{"spec":{"dnsLogsFlushInterval":"10s"}}'
 kubectl patch felixconfiguration.p default -p '{"spec":{"flowLogsFileAggregationKindForAllowed":1}}'
  1. Configure Felix for log data collection.

    Felix is one of Calico components that is responsible for configuring routes, ACLs, and anything else required on the host to provide desired connectivity for the endpoints on that host.

 kubectl patch felixconfiguration default --type='merge' -p '{"spec":{"policySyncPathPrefix":"/var/run/nodeagent","l7LogsFileEnabled":true}}'

Module 1: Configuring Environment

Tasks:

  1. security and Kubernetes platform policies should be evaluated before any other policies.
  2. kubernetes platform team require an explicitly allow workloads to connect to kubernetes DNS component.
  3. online boutique microservices application to be deployed in default namespace & dev application to be deployed in dev namespace.

Steps

  1. Deploy policy tiers.

    We are going to deploy some policies into policy tier to take advantage of hierarcical policy management.

    kubectl apply -f demo/tiers/tiers.yaml

    This will add tiers security and platform to the Calico cluster.

  2. Deploy base policy.

    In order to explicitly allow workloads to connect to the Kubernetes DNS component, we are going to implement a policy that controls such traffic.

    kubectl apply -f demo/10-security-controls/allow-kube-dns.yaml
  3. Deploy applications.

    # deploy dev app stack
    kubectl apply -f demo/dev/app.manifests.yaml
    
    # deploy boutiqueshop app stack
    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/master/release/kubernetes-manifests.yaml
  4. Deploy global alerts.

    The alerts will be explored in a later lab.

    kubectl apply -f demo/50-alerts/globalnetworkset.changed.yaml
    kubectl apply -f demo/50-alerts/unsanctioned.dns.access.yaml
    kubectl apply -f demo/50-alerts/unsanctioned.lateral.access.yaml

Module 2: Enable L7 Logs

L7 logs capture application interactions from HTTP header data in requests. Data shows what is actually sent in communications between specific pods, providing more specificity than flow logs. (Flow logs capture data only from connections for workload interactions).

Calico collects L7 logs by sending the selected traffic through an Envoy proxy.

Tasks:

  1. Enable L7 logging on frontend deployment in default ns and dev-nginx in dev ns.

Steps

  1. create a Kubernetes pull secret for accessing Calico images, this should be provided by Tigera Team

    kubectl create secret generic tigera-pull-secret -n <application pod namespace> --from-file=.dockerconfigjson=<path/to/pull/secret> --type kubernetes.io/dockerconfigjson
  2. Download the patch file to patch-envoy.yaml.

    curl https://docs.tigera.io/v3.7/manifests/l7/patch-envoy.yaml -O
  3. Download the Envoy config.

    curl https://docs.tigera.io/v3.7/manifests/l7/envoy-config.yaml -O
  4. Create the Envoy config.

    kubectl create configmap envoy-config -n <application pod namespace> --from-file=envoy-config.yaml
  5. Configure Felix for log data collection.

    kubectl patch felixconfiguration default --type='merge' -p '{"spec":{"policySyncPathPrefix":"/var/run/nodeagent"}}'
  6. Install the envoy log collector

    kubectl patch deployment <name of application deployment> -n <namespace> --patch "$(cat patch-envoy.yaml)"

After completeing module 3 some traffic should be generated, go check the "L7 HTTP" dashboard in Kibana also the HTTP logs under the service graph section.

L7 HTTP Dashboard

Service Graph HTTP

Module 3: Using security controls

Goal: Leverage network policies to segment connections within Kubernetes cluster and prevent known bad actors from accessing the workloads.

Tasks:

  1. Security Team require applying rule to disable all traffic by default.
  2. DevOps team provided their application traffic reqirements and only connections between the components within each namespaces should be allowed as configured by the policies.

Steps

  1. Test connectivity between application components and across application stacks.

    a. Test connectivity between workloads within each namespace.

    kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://nginx-svc 2>/dev/null | grep -i http'
    
    kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI frontend 2>/dev/null | grep -i http'
    
    kubectl exec -it $(kubectl get po -l app=frontend -ojsonpath='{.items[0].metadata.name}') -c server -- sh -c 'nc -zv productcatalogservice 3550'

    b. Test connectivity across namespaces.

    kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://frontend.default 2>/dev/null | grep -i http'
    kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI http://nginx-svc.dev 2>/dev/null | grep -i http'

    c. Test connectivity from each namespace to the Internet.

    # test connectivity from dev namespace to the Internet
    kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://www.google.com 2>/dev/null | grep -i http'
    
    # test connectivity from default namespace to the Internet
    kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI www.google.com 2>/dev/null | grep -i http'

    All of these tests should succeed if there are no policies in place to govern the traffic for dev and default namespaces.

  2. Apply staged default-deny policy.

    Staged default-deny policy is a good way of catching any traffic that is not explicitly allowed by a policy without explicitly blocking it.

    kubectl apply -f demo/10-security-controls/staged.default-deny.yaml

    You should be able to view the potential affect of the staged default-deny policy if you navigate to the Dashboard view in the Enterprise Manager UI and look at the Packets by Policy histogram.

    # make a request across namespaces and view Packets by Policy histogram
    for i in {1..10}; do kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://frontend.default 2>/dev/null | grep -i http'; sleep 2; done

    The staged policy does not affect the traffic directly but allows you to view the policy impact if it were to be enforced.

  3. Apply network policies to control East-West traffic.

    # deploy dev policies
    kubectl apply -f demo/dev/policies.yaml
    
    # deploy boutiqueshop policies
    kubectl apply -f demo/boutiqueshop/policies.yaml

    Now as we have proper policies in place, we can enforce default-deny policy moving closer to zero-trust security approach. You can either enforced the already deployed staged default-deny policy using the Policies Board view in the Enterirpse Manager UI, or you can apply an enforcing default-deny policy manifest.

    # apply enforcing default-deny policy manifest
    kubectl apply -f demo/10-security-controls/default-deny.yaml
    # you can delete staged default-deny policy
    kubectl delete -f demo/10-security-controls/staged.default-deny.yaml
  4. Test connectivity with policies in place.

    a. The only connections between the components within each namespaces should be allowed as configured by the policies.

    # test connectivity within dev namespace
    kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://nginx-svc 2>/dev/null | grep -i http'
    
    # test connectivity within default namespace
    kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI frontend 2>/dev/null | grep -i http'

    b. The connections across dev and default namespaces should be blocked by the global default-deny policy.

    # test connectivity from dev namespace to default namespace
    kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://frontend.default 2>/dev/null | grep -i http'
    
    # test connectivity from default namespace to dev namespace
    kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI http://nginx-svc.dev 2>/dev/null | grep -i http'

    c. The connections to the Internet should be blocked by the configured policies.

    # test connectivity from dev namespace to the Internet
    kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://www.google.com 2>/dev/null | grep -i http'
    
    # test connectivity from default namespace to the Internet
    kubectl exec -it $(kubectl get po -l app=loadgenerator -ojsonpath='{.items[0].metadata.name}') -- sh -c 'curl -m3 -sI www.google.com 2>/dev/null | grep -i http'

Module 4: Alerts

Goal: Use global alerts to notify security and operations teams about unsanctioned or suspicious activity.

Tasks:

  1. Some alerts have been triggered. Examine these alerts and identify what suspicious activity led to this.

Steps

  1. View triggered alerts.

    We implemented alerts in one of the first labs in order to see how our activity can trigger them.

    Open Alerts view to see all triggered alerts in the cluster. Review the generated alerts.

    alerts view

    You can also review the alerts configuration and templates by navigating to alerts configuration in the top right corner.

  2. Review alerts manifests.

    Navigate to demo/50-alerts and review YAML manifests that represent alerts definitions. Each file containes an alert template and alert definition. Alerts templates can be used to quickly create an alert definition in the UI.

Module 5: Using egress access controls & Flow Visualizations

Tasks:

  1. Dev team raised a request to allow traffic between dev and default namspaces as they are developing a new service to integrate with the boutiqueshop app, use flow visualization to identify which pods are trying to talk to each other.
  2. it is also requested to allow centos pod to access api.twilio.com externally.

Steps

  1. Test connectivity within the cluster and to the external endpoint.

    a. Test connectivity between dev/centos pod and default/frontend pod.

    # test connectivity from dev namespace to default namespace
    for i in {1..10}; do kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://frontend.default 2>/dev/null | grep -i http'; sleep 2; done

    b. Test connectivity from dev/centos to the external endpoint.

    # test connectivity from dev namespace to the Internet
    kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://www.google.com 2>/dev/null | grep -i http'

    The access should be denied as the policies configured in previous module do not allow it.

  2. Navigate in the flow visualizations chart to find out the source and destinations pods, also you can use policy recommendations to create the new policies.

flow-visualization-m5

  1. Implement egress policy to allow egress access from a workload in one namespace, e.g. dev/centos, to a service in another namespace, e.g. default/frontend.

    a. Deploy egress policy.

    kubectl apply -f demo/20-egress-access-controls/centos-to-frontend.yaml

    b. Test connectivity between dev/centos pod and default/frontend service.

    kubectl -n dev exec -t centos -- sh -c 'curl -m3 -sI http://frontend.default 2>/dev/null | grep -i http'

    The access should be allowed once the egress policy is in place.

  2. Implement DNS policy to allow the external endpoint access from a specific workload, e.g. dev/centos.

    a. Apply a policy to allow access to api.twilio.com endpoint using DNS rule.

    # deploy dns policy
    kubectl apply -f demo/20-egress-access-controls/dns-policy.yaml
    
    # test egress access to api.twilio.com
    kubectl -n dev exec -t centos -- sh -c 'curl -m3 -skI https://api.twilio.com 2>/dev/null | grep -i http'
    # test egress access to www.google.com
    kubectl -n dev exec -t centos -- sh -c 'curl -m3 -skI https://www.google.com 2>/dev/null | grep -i http'

    Access to the api.twilio.com endpoint should be allowed by the DNS policy but not to any other external endpoints like www.google.com unless we modify the policy to include that domain name.

    b. Edit the policy to use a NetworkSet instead of inline DNS rule.

    # deploy network set
    kubectl apply -f demo/20-egress-access-controls/netset.external-apis.yaml
    # deploy DNS policy using the network set
    kubectl apply -f demo/20-egress-access-controls/dns-policy.netset.yaml

    As a bonus example, you can modify the external-apis network set to include *.google.com domain name which would allow access to Google subdomains. If you do it, you can would allow acess to subdomains like www.google.com, docs.google.com, etc.

Module 6: Dynamic packet capture

Goal: Configure packet capture for specific pods and review captured payload.

Steps

  1. Configure packet capture.

    Navigate to demo/60-packet-capture and review YAML manifests that represent packet capture definition. Each packet capture is configured by deploing a PacketCapture resource that targets endpoints using selector and labels.

    Deploy packet capture definition to capture packets for dev/nginx pods.

    kubectl apply -f demo/60-packet-capture/nginx-pcap.yaml

    Once the PacketCapture resource is deployed, Calico starts capturing packets for all endpoints configured in the selector field.

  2. Install calicoctl CLI

    The easiest way to retrieve captured *.pcap files is to use calicoctl CLI.

    # download and configure calicoctl
    curl -o calicoctl -O -L https://docs.tigera.io/download/binaries/v3.7.0/calicoctl
    chmod +x calicoctl
    sudo mv calicoctl /usr/local/bin/
    calicoctl version
  3. Fetch and review captured payload.

    The captured *.pcap files are stored on the hosts where pods are running at the time the PacketCapture resource is active.

    Retrieve captured *.pcap files and review the content.

    # get pcap files
    calicoctl captured-packets copy dev-capture-nginx --namespace dev
    
    ls dev-nginx*
    # view *.pcap content
    tcpdump -Xr dev-nginx-XXXXXX.pcap
  4. Stop packet capture

    Stop packet capture by removing the PacketCapture resource.

    kubectl delete -f demo/60-packet-capture/nginx-pcap.yaml

Congratulations! You have finished all the labs in the workshop.

About