botbw's starred repositories
alphafold3
AlphaFold 3 inference pipeline.
Triton-Puzzles-Lite
Puzzles for learning Triton, play it with minimal environment configuration!
Awesome-ML-SYS-Tutorial
My learning notes/codes for ML SYS.
semiring-einsum
Generic PyTorch implementation of einsum that supports different semirings
SpringShell
Spring4Shell - Spring Core RCE - CVE-2022-22965
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
AI-System-School
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑💻 Video Tutorials.
Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
Spring4Shell-POC
Dockerized Spring4Shell (CVE-2022-22965) PoC application and exploit
CVE-2024-6387
Remote Unauthenticated Code Execution Vulnerability in OpenSSH server (CVE-2024-6387)
EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
gpu-benches
collection of benchmarks to measure basic GPU capabilities
GPU-Puzzles
Solve puzzles. Learn CUDA.
thread-pool
BS::thread_pool: a fast, lightweight, and easy-to-use C++17 thread pool library
TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe