yylt / kcrow

fine-grained control based on NRI, such as Nvidia Driver, kernel driver...

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

kcrow

Go Report Card CodeFactor codecov

English | 简体中文

Overview

kcrow is primarily responsible for multi-tenant resource management, as well as device and runtime-related initialization functions. The current capabilities are as follows:

  • Support for controlling ulimit and cpu/memory cgroup resources

  • Support for configuring resource annotations at multiple levels, including namespace, node, and container

  • Support for priority, with the current resource setting priority being pod > node > namespace

When encountering questions about whether to configure and how to configure information regarding cgroup and ulimit, you can refer to cgroup and ulimit link, and more examples will be provided later for explanation.

Roadmap

Feature Status
Multi-tenant Alpha
Cpu cgroup Alpha
Memory cgroup Alpha
Ulimit Alpha
NPU/GPU runtime In-plan
NPU/GPU topology In-plan

Regarding the detailed functional planning, you can refer to the following: roadmap

Scenarios:

  • Multi-tenant compute resource isolation, controlled through cgroup or ulimit methods

  • Network I/O intensive applications like middleware, data storage, log observability, AI training, etc., supporting customized ulimit quotas

  • Enhancing scheduling and runtime capabilities for NPU/GPU in AI base platforms

Quick Start

Prerequisites

  • containerd version is greater than 1.7.7
  • Open and configure nri. Usually, the containerd configuration file is in '/etc/containerd/config.toml'
  [plugins."io.containerd.nri.v1.nri"]
    disable = false
    disable_connections = false
    plugin_config_path = "/etc/nri/conf.d"
    plugin_path = "/opt/nri/plugins"
    plugin_registration_timeout = "5s"
    plugin_request_timeout = "2s"
    socket_path = "/var/run/nri/nri.sock"

install

git clone https://github.com/kcrow-io/kcrow/
helm install charts/kcrowdaemon kcrow -n kcrow  --create-namespace

example

// namespace 
apiVersion: v1
kind: Namespace
metadata:
  name: kcrowtest
  annotations:
    nofile.rlimit.kcrow.io: '{"hard":65535,"soft":65535}'
    cpu.cgroup.kcrow.io: '{"cpus":"0-2"}'

// node 
apiVersion: v1
kind: Node
metadata:
  name: node-1
  annotations:
    nofile.rlimit.kcrow.io: '{"hard":65535,"soft":65535}'
    cpu.cgroup.kcrow.io: '{"cpus":"0-2"}'

Contributing

Contributions of code and issues are welcome. Please submit an issue or a pull request.

License

This project is licensed under the MIT License. Please see the license file for more details.

Contact

If you have any questions or suggestions, please feel free to contact us. You can find us on GitHub.

About

fine-grained control based on NRI, such as Nvidia Driver, kernel driver...

License:Apache License 2.0


Languages

Language:Go 63.7%Language:Shell 21.5%Language:Makefile 8.8%Language:jq 2.8%Language:Smarty 2.2%Language:Dockerfile 1.0%