cyrildiagne / kuda

Serverless APIs on remote GPUs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AWS: GPU node autoscaling doesn't work

cyrildiagne opened this issue · comments

GPU nodes don't autoscale with the AWS provider.

Because Knative doesn't support nodeSelectors and tolerations (knative/serving#1816) we can't rely on node taints to limit the workload being assigned to the GPU nodeGroup.

An immediate workaround could be to use namespace-level annotations: knative/serving#1816 (comment)

This doesn't seem to be a problem on GKE so it's also worth investigating how they do it.

Out of scope for now