王天庆's starred repositories

anything-llm

The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.

Language:JavaScriptLicense:MITStargazers:16718Issues:136Issues:1176

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:9262Issues:158Issues:578

wandb

🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.

Language:PythonLicense:MITStargazers:8567Issues:56Issues:3185

kraken

P2P Docker registry capable of distributing TBs of data in seconds

Language:GoLicense:Apache-2.0Stargazers:5973Issues:89Issues:105

flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

Language:GoLicense:Apache-2.0Stargazers:5171Issues:258Issues:3044

mirrord

Connect your local process and your cloud environment, and run local code in cloud conditions.

Language:RustLicense:MITStargazers:3585Issues:20Issues:831

nccl

Optimized primitives for collective multi-GPU communication

Language:C++License:NOASSERTIONStargazers:2951Issues:151Issues:1147

node-problem-detector

This is a place for various problem detectors running on the Kubernetes nodes.

Language:GoLicense:Apache-2.0Stargazers:2820Issues:55Issues:361

mergo

Mergo: merging Go structs and maps since 2013

Language:GoLicense:BSD-3-ClauseStargazers:2793Issues:26Issues:129

k8s-device-plugin

NVIDIA device plugin for Kubernetes

Language:GoLicense:Apache-2.0Stargazers:2541Issues:63Issues:449

shell-operator

Shell-operator is a tool for running event-driven scripts in a Kubernetes cluster

Language:GoLicense:Apache-2.0Stargazers:2295Issues:35Issues:141

OpenFunction

Cloud Native Function-as-a-Service Platform (CNCF Sandbox Project)

Language:GoLicense:Apache-2.0Stargazers:1476Issues:30Issues:168

proton

A streaming SQL engine, a fast and lightweight alternative to ksqlDB and Apache Flink, 🚀 powered by ClickHouse.

Language:C++License:Apache-2.0Stargazers:1366Issues:20Issues:427

aistore

AIStore: scalable storage for AI applications

Language:GoLicense:MITStargazers:1152Issues:44Issues:84

qryn

Polyglot Observability Stack. Lightweight & Drop-in compatible with Loki, Prometheus, Tempo, Pyroscope, Opentelemetry, Datadog & more! WASM powered ⭐️ Star to Support

Language:JavaScriptLicense:AGPL-3.0Stargazers:1049Issues:14Issues:198

alloy

OpenTelemetry Collector distribution with programmable pipelines

Language:GoLicense:Apache-2.0Stargazers:987Issues:94Issues:569

ktunnel

A cli that exposes your local resources to kubernetes

Language:GoLicense:GPL-3.0Stargazers:906Issues:9Issues:72

awesome-log-analysis

A list of awesome research on log analysis, anomaly detection, fault localization, and AIOps

License:MITStargazers:686Issues:38Issues:0

rulego

⛓️RuleGo is a lightweight, high-performance, embedded, and scalable component orchestration rule engine framework based on the Go language.

Language:GoLicense:Apache-2.0Stargazers:525Issues:10Issues:13

sriov-network-device-plugin

SRIOV network device plugin for Kubernetes

Language:GoLicense:Apache-2.0Stargazers:383Issues:34Issues:216

whereabouts

A CNI IPAM plugin that assigns IP addresses cluster-wide

Language:GoLicense:Apache-2.0Stargazers:273Issues:12Issues:142

go-nvml

Go Bindings for the NVIDIA Management Library (NVML)

Language:CLicense:Apache-2.0Stargazers:268Issues:18Issues:45

lifecycle-toolkit

Toolkit for cloud-native application lifecycle management

Language:GoLicense:Apache-2.0Stargazers:266Issues:9Issues:1008

explore-logs

Repo for the Loki log exploration app

Language:TypeScriptLicense:AGPL-3.0Stargazers:233Issues:11Issues:223

arishem

A high performance and lightweight rule engine written by Golang.

Language:GoLicense:Apache-2.0Stargazers:182Issues:7Issues:4

awesome-AIOps

A curated list of awesome academic researches and industrial materials about Artificial Intelligence for IT Operations (AIOps).

License:MITStargazers:180Issues:8Issues:0

knavigator

knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.

Language:GoLicense:Apache-2.0Stargazers:43Issues:8Issues:0

qryn-view

qryn polyglot user interface to explore logs, metrics and traces :eye: Grafana Explore alternative compatible with Loki, Prometheus and Tempo

Language:TypeScriptLicense:AGPL-3.0Stargazers:37Issues:7Issues:187

numalogic-prometheus

AIOps for metrics in Prometheus

Language:PythonLicense:Apache-2.0Stargazers:30Issues:8Issues:26

aiops-modules

AIOps modules is a collection of reusable Infrastructure as Code (IaC) modules for Machine Learning (ML), Foundation Models (FM), Large Language Models (LLM) and GenAI development and operations on AWS

Language:PythonLicense:Apache-2.0Stargazers:28Issues:8Issues:8