ted chang's repositories
ai-data-workshop
Workshop for Data and AI Sessions
codeflare-common
Common packages for use with CodeFlare Distributed Workload stack.
codeflare-operator
Operator for installation and lifecycle management of CodeFlare distributed workload stack, starting with MCAD and InstaScale
codeflare-sdk
An intuitive, easy-to-use python interface for batch resource requesting, access, job submission, and observation. Simplifying the developer's life while enabling access to high-performance compute resources, either in the cloud or on-prem.
community-operators
The canonical source for Kubernetes Operators that are published on OperatorHub.io and part of the default catalog of the Operator Lifecycle Manager.
community-operators-prod
community-operators metadata backing OpenShift OperatorHub
distributed-workloads
Artifacts for installing the Distributed Workloads stack as part of ODH
flink-kubernetes-operator
Apache Flink Kubernetes Operator
fms-hf-tuning
🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
kfserving
Serverless Inferencing on Kubernetes
kuberay
A toolkit to run Ray applications on Kubernetes
kueue
Kubernetes-native Job Queueing
modelmesh-serving
Controller for ModelMesh
monitor-custom-ml-engine-with-watson-openscale
Deploy a Custom Machine Learning engine and Monitor Payload Logging and Fairness using AI OpenScale
multi-cluster-app-dispatcher
Holistic job manager on Kubernetes
odh-contrib-manifests
Component manifests contributed by community members that integrate with Open Data Hub
odh-manifests
A repository for Open Data Hub Kustomize manifests extending upstream Kubeflow manifests
opendatahub-operator
Open Data Hub operator to manage ODH component integrations
ray
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
train-conductor
Training job management tool for foundation model service
training-operator
Training operators on Kubernetes.
trustyai-service-operator
Kubernetes operator for the TrustyAI service
watsonx-chatbot-lab.github.io
A template to embed the watsonx assistant chatbot using GitHub pages