Add K8s metrics fetching options, allowing configuration and exposing these metrics to the metrics collection stage
danielmapar opened this issue · comments
Is your feature request related to a problem? Please describe.
I am currently trying to access pods cpu and memory consumption in order to write my own autoscaler. However, I can't seem to find there data inside spec = json.loads(sys.stdin.read())
This is the JSON dump that I get from my metric.py
script:
{"resource": {"metadata": {"name": "frontend-6b64dc9665-gx64z", "generateName": "frontend-6b64dc9665-", "namespace": "default", "uid": "8b63127d-ed2a-46c7-a2ac-b188b986d480", "resourceVersion": "82988", "creationTimestamp": "2021-02-14T04:28:38Z", "labels": {"app": "frontend", "pod-template-hash": "6b64dc9665"}, "annotations": {"sidecar.istio.io/rewriteAppHTTPProbers": "true"}, "ownerReferences": [{"apiVersion": "apps/v1", "kind": "ReplicaSet", "name": "frontend-6b64dc9665", "uid": "737d8ebd-1fd4-460b-b6c3-0a07d1981869", "controller": true, "blockOwnerDeletion": true}], "managedFields": [{"manager": "k3s", "operation": "Update", "apiVersion": "v1", "time": "2021-02-14T04:28:52Z", "fieldsType": "FieldsV1", "fieldsV1": {"f:metadata": {"f:annotations": {".": {}, "f:sidecar.istio.io/rewriteAppHTTPProbers": {}}, "f:generateName": {}, "f:labels": {".": {}, "f:app": {}, "f:pod-template-hash": {}}, "f:ownerReferences": {".": {}, "k:{\"uid\":\"737d8ebd-1fd4-460b-b6c3-0a07d1981869\"}": {".": {}, "f:apiVersion": {}, "f:blockOwnerDeletion": {}, "f:controller": {}, "f:kind": {}, "f:name": {}, "f:uid": {}}}}, "f:spec": {"f:containers": {"k:{\"name\":\"server\"}": {".": {}, "f:env": {".": {}, "k:{\"name\":\"AD_SERVICE_ADDR\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"CART_SERVICE_ADDR\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"CHECKOUT_SERVICE_ADDR\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"CURRENCY_SERVICE_ADDR\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"ENV_PLATFORM\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"PORT\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"PRODUCT_CATALOG_SERVICE_ADDR\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"RECOMMENDATION_SERVICE_ADDR\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"SHIPPING_SERVICE_ADDR\"}": {".": {}, "f:name": {}, "f:value": {}}}, "f:image": {}, "f:imagePullPolicy": {}, "f:livenessProbe": {".": {}, "f:failureThreshold": {}, "f:httpGet": {".": {}, "f:httpHeaders": {}, "f:path": {}, "f:port": {}, "f:scheme": {}}, "f:initialDelaySeconds": {}, "f:periodSeconds": {}, "f:successThreshold": {}, "f:timeoutSeconds": {}}, "f:name": {}, "f:ports": {".": {}, "k:{\"containerPort\":8080,\"protocol\":\"TCP\"}": {".": {}, "f:containerPort": {}, "f:protocol": {}}}, "f:readinessProbe": {".": {}, "f:failureThreshold": {}, "f:httpGet": {".": {}, "f:httpHeaders": {}, "f:path": {}, "f:port": {}, "f:scheme": {}}, "f:initialDelaySeconds": {}, "f:periodSeconds": {}, "f:successThreshold": {}, "f:timeoutSeconds": {}}, "f:resources": {".": {}, "f:limits": {".": {}, "f:cpu": {}, "f:memory": {}}, "f:requests": {".": {}, "f:cpu": {}, "f:memory": {}}}, "f:terminationMessagePath": {}, "f:terminationMessagePolicy": {}}}, "f:dnsPolicy": {}, "f:enableServiceLinks": {}, "f:restartPolicy": {}, "f:schedulerName": {}, "f:securityContext": {}, "f:serviceAccount": {}, "f:serviceAccountName": {}, "f:terminationGracePeriodSeconds": {}}, "f:status": {"f:conditions": {"k:{\"type\":\"ContainersReady\"}": {".": {}, "f:lastProbeTime": {}, "f:lastTransitionTime": {}, "f:status": {}, "f:type": {}}, "k:{\"type\":\"Initialized\"}": {".": {}, "f:lastProbeTime": {}, "f:lastTransitionTime": {}, "f:status": {}, "f:type": {}}, "k:{\"type\":\"Ready\"}": {".": {}, "f:lastProbeTime": {}, "f:lastTransitionTime": {}, "f:status": {}, "f:type": {}}}, "f:containerStatuses": {}, "f:hostIP": {}, "f:phase": {}, "f:podIP": {}, "f:podIPs": {".": {}, "k:{\"ip\":\"10.42.0.42\"}": {".": {}, "f:ip": {}}}, "f:startTime": {}}}}]}, "spec": {"volumes": [{"name": "default-token-mgjm5", "secret": {"secretName": "default-token-mgjm5", "defaultMode": 420}}], "containers": [{"name": "server", "image": "gcr.io/google-samples/microservices-demo/frontend:v0.2.1", "ports": [{"containerPort": 8080, "protocol": "TCP"}], "env": [{"name": "PORT", "value": "8080"}, {"name": "PRODUCT_CATALOG_SERVICE_ADDR", "value": "productcatalogservice:3550"}, {"name": "CURRENCY_SERVICE_ADDR", "value": "currencyservice:7000"}, {"name": "CART_SERVICE_ADDR", "value": "cartservice:7070"}, {"name": "RECOMMENDATION_SERVICE_ADDR", "value": "recommendationservice:8080"}, {"name": "SHIPPING_SERVICE_ADDR", "value": "shippingservice:50051"}, {"name": "CHECKOUT_SERVICE_ADDR", "value": "checkoutservice:5050"}, {"name": "AD_SERVICE_ADDR", "value": "adservice:9555"}, {"name": "ENV_PLATFORM", "value": "gcp"}], "resources": {"limits": {"cpu": "200m", "memory": "128Mi"}, "requests": {"cpu": "100m", "memory": "64Mi"}}, "volumeMounts": [{"name": "default-token-mgjm5", "readOnly": true, "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"}], "livenessProbe": {"httpGet": {"path": "/_healthz", "port": 8080, "scheme": "HTTP", "httpHeaders": [{"name": "Cookie", "value": "shop_session-id=x-liveness-probe"}]}, "initialDelaySeconds": 10, "timeoutSeconds": 1, "periodSeconds": 10, "successThreshold": 1, "failureThreshold": 3}, "readinessProbe": {"httpGet": {"path": "/_healthz", "port": 8080, "scheme": "HTTP", "httpHeaders": [{"name": "Cookie", "value": "shop_session-id=x-readiness-probe"}]}, "initialDelaySeconds": 10, "timeoutSeconds": 1, "periodSeconds": 10, "successThreshold": 1, "failureThreshold": 3}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "imagePullPolicy": "IfNotPresent"}], "restartPolicy": "Always", "terminationGracePeriodSeconds": 30, "dnsPolicy": "ClusterFirst", "serviceAccountName": "default", "serviceAccount": "default", "nodeName": "eecs6446proj1-1", "securityContext": {}, "schedulerName": "default-scheduler", "tolerations": [{"key": "node.kubernetes.io/not-ready", "operator": "Exists", "effect": "NoExecute", "tolerationSeconds": 300}, {"key": "node.kubernetes.io/unreachable", "operator": "Exists", "effect": "NoExecute", "tolerationSeconds": 300}], "priority": 0, "enableServiceLinks": true, "preemptionPolicy": "PreemptLowerPriority"}, "status": {"phase": "Running", "conditions": [{"type": "Initialized", "status": "True", "lastProbeTime": null, "lastTransitionTime": "2021-02-14T04:28:38Z"}, {"type": "Ready", "status": "True", "lastProbeTime": null, "lastTransitionTime": "2021-02-14T04:28:52Z"}, {"type": "ContainersReady", "status": "True", "lastProbeTime": null, "lastTransitionTime": "2021-02-14T04:28:52Z"}, {"type": "PodScheduled", "status": "True", "lastProbeTime": null, "lastTransitionTime": "2021-02-14T04:28:38Z"}], "hostIP": "192.168.23.92", "podIP": "10.42.0.42", "podIPs": [{"ip": "10.42.0.42"}], "startTime": "2021-02-14T04:28:38Z", "containerStatuses": [{"name": "server", "state": {"running": {"startedAt": "2021-02-14T04:28:39Z"}}, "lastState": {}, "ready": true, "restartCount": 0, "image": "gcr.io/google-samples/microservices-demo/frontend:v0.2.1", "imageID": "gcr.io/google-samples/microservices-demo/frontend@sha256:ee13a82435b5031a657d6dbc76327b95186e103d28dce5a47c5d835d040e441b", "containerID": "containerd://ed293f79974f4a24b70edb55d9610c4f5430a0b2403f20d8547bf0df527da177", "started": true}], "qosClass": "Burstable"}}, "runType": "scaler"}
Describe the solution you'd like
I would like to be able to access cpu and memory consumption by default inside the spec
JSON object.
Describe alternatives you've considered
By looking at examples I saw some autoscalers calling others pods /metrics
endpoint. I supposed this is a solution, but I wish this information could be gathered by the framework via the Kubernetes rest api.
Extra context
metrics.py
import os
import json
import sys
def main():
# Parse spec into a dict
spec = json.loads(sys.stdin.read())
metric(spec)
def metric(spec):
sys.stderr.write(json.dumps(spec))
sys.stderr.write("No 'numPods' label on resource being managed")
exit(1)
if __name__ == "__main__":
main()
cpa.yaml
apiVersion: custompodautoscaler.com/v1
kind: CustomPodAutoscaler
metadata:
name: python-custom-autoscaler-frontend
spec:
template:
spec:
containers:
- name: python-custom-autoscaler-frontend
image: danielmapar/python-custom-autoscaler:v1.3
imagePullPolicy: IfNotPresent
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: frontend
config:
- name: interval
value: "10000"
config.yaml
evaluate:
type: "shell"
timeout: 2500
shell:
entrypoint: "python"
command:
- "/evaluate.py"
metric:
type: "shell"
timeout: 2500
shell:
entrypoint: "python"
command:
- "/metric.py"
runMode: "per-pod"
Hey, thanks for raising this issue.
At the minute the Custom Pod Autoscaler does not gather any data from the K8s metrics server; but I definitely think this would be a good feature, since I imagine many autoscalers would use this info. Perhaps it could be included as you've suggested in the metric gathering stage of the pipeline in the JSON payload sent to the metric gathering stage.
The Horizontal Pod Autoscaler as a Custom Pod Autoscaler queries the metrics server, so some of the code could be taken from here and adapted to include in the Custom Pod Autoscaler pipeline.
A simplified view of this pipeline would be:
- Gather K8s resource information (Pod/Deployment/StatefulSet etc.).
- Query the metrics server if the metrics server is available using the resource information retrieved.
- If the metrics server is not available, skip this step and continue.
- If the resource information is not available yet in the metrics server, skip this step and continue (for example if there are no metrics for the resource uploaded yet).
- Combine the K8s resource information with any gathered metrics information, serialising into JSON.
- Call the metrics stage, providing the JSON generated in the previous step.
- Get the results of the metrics stage, continue as normal.
- ...
This could probably be built on further to introduce some flexibility, with extra configuration options:
includeResourceMetrics
= Iftrue
will include CPU/Memory/Metrics server metrics if available, iffalse
will skip and not include (defaultfalse
).requireResourceMetrics
= Iftrue
will require metrics server information, if the server is not available or the metrics for the resource are not available then it will fail with an appropriate error, iffalse
this will not occur (defaultfalse
). Only applies ifincludeResourceMetrics
istrue
.
Does this feature sound like it would address your needs? If so I will set it as next on my to-do list for this project.
Yes, that would be perfect. Otherwise, another solution would be a step by step guide on how to integrate the metric step (metrics.py) with Prometheus or any famous metric gatherer.
Based on your answer I will just call the k8s api inside of the metrics step and get that information.
Thanks for the quick response, appreciate it.
This is now available in the v1.1.0
release of the Custom Pod Autoscaler. Make sure you use the new v1.1.0
release of the Custom Pod Autoscaler Operator too, since it includes the roleRequiresMetricsServer flag for Custom Pod Autoscalers which should make it a bit easier, as it will handle setting up access to the K8s metrics server for your autoscaler.