Romil Bhardwaj's repositories
kube-tutorial
Kubernetes Tutorial for the PS2 group meetings at UC Berkeley
romilphdthesis
My PhD thesis on resource efficient machine learning
cilantro-workloads
Workloads used in OSDI 2023 paper titled "Cilantro - Performance-Aware Resource Allocation for General Objectives via Online Feedback"
scalable-env-demo
An example using SkyPilot to scale environments for agents
ai-infra-landscape
This is a landscape of the infrastructure that powers the generative AI ecosystem
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
bitsandbytes
8-bit CUDA functions for PyTorch
cloudsuite
A Benchmark Suite for Cloud Services.
DeathStarBench
Open-source benchmark suite for cloud microservices
liveness-bot
A liveness bot that logs the timestamp to a file. Good for checking when your spot instance got killed.
meta-fuse-csi-plugin
A CSI plugin for All FUSE implementations
pytorch-distributed-resnet
Example of Pytorch Resnet Distributed Training - pulled from https://leimao.github.io/blog/PyTorch-Distributed-Training/
romilbhardwaj.github.io
My Website
sharedfs-pingpong
Example app that demonstrates communication using a shared file system
skycamp-infra
Tool to provision infrastructure for SkyCamp 2022 tutorials
skypilot-playground
Creates a public playground for people to play with SkyPilot
smarter-device-manager
Enables k8s containers to access devices (linux device drivers) available on nodes