botbw

followers

following

stars

None

Singapore

https://botbw.github.io/

Organizations

AoTTG-2

hpcaitech

botbw's starred repositories

alphafold3

AlphaFold 3 inference pipeline.

Language:PythonNOASSERTION480700

Triton-Puzzles-Lite

Puzzles for learning Triton, play it with minimal environment configuration!

Language:PythonApache-2.011300

Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Language:PythonApache-2.012500

spack

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Language:PythonNOASSERTION440100

mpich

Official MPICH Repository

Language:CNOASSERTION56000

ompi

Open MPI main development repository

Language:CNOASSERTION216900

nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Language:PythonApache-2.0482300

semiring-einsum

Generic PyTorch implementation of einsum that supports different semirings

Language:PythonMIT4600

SpringShell

Spring4Shell - Spring Core RCE - CVE-2022-22965

Language:Python12700

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0866800

DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

Language:C++Apache-2.041100

AI-System-School

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

MIT269100

mase

Machine-Learning Accelerator System Exploration Tools

Language:PythonNOASSERTION12300

Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

Language:MakefileMIT9700

models

The best OSS video generation models

Language:PythonApache-2.0200900

pybind11

Seamless operability between C++11 and Python

Language:C++NOASSERTION1577200

OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)

Language:PythonApache-2.0263700

Spring4Shell-POC

Dockerized Spring4Shell (CVE-2022-22965) PoC application and exploit

Language:Python31200

CVE-2024-6387

Remote Unauthenticated Code Execution Vulnerability in OpenSSH server (CVE-2024-6387)

Language:PythonMIT4500

HeCBench

Language:C++BSD-3-Clause21600

ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

Language:CNOASSERTION115500

EasyParallelLibrary

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Language:PythonApache-2.026400

mirage

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

Language:C++Apache-2.063200

gemlite

Simple and fast low-bit matmul kernels in CUDA / Triton

Language:PythonApache-2.014100

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Language:Cuda159100

gpu-benches

collection of benchmarks to measure basic GPU capabilities

Language:Jupyter NotebookGPL-3.026500

GPU-Puzzles

Solve puzzles. Learn CUDA.

Language:Jupyter NotebookMIT991300

thread-pool

BS::thread_pool: a fast, lightweight, and easy-to-use C++17 thread pool library

Language:C++MIT220700

Awesome-DL-Scheduling-Papers

TensorNVMe

A Python library transfers PyTorch tensors between CPU and NVMe

Language:C++9800