zobinHuang

Zhuobin Huang's starred repositories

Self-Hosting-Guide

Self-Hosting Guide. Learn all about locally hosting (on premises & private web servers) and managing software applications by yourself or your organization. Including Cloud, LLMs, WireGuard, Automation, Home Assistant, and Networking.

Language:Dockerfile1050900

generative-recommenders

Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Language:PythonApache-2.065600

pyo3

Rust bindings for the Python interpreter

Language:RustApache-2.01200800

cudaparsers

Parsers for CUDA binary files

Language:RustApache-2.02100

ecapture

Capturing SSL/TLS plaintext without a CA certificate using eBPF. Supported on Linux/Android kernels for amd64/arm64.

Language:CApache-2.01272300

Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

Apache-2.0344700

quiet-star

Code for Quiet-STaR

Language:PythonApache-2.051900

g1

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Language:PythonMIT306000

rStar

Language:PythonMIT29500

xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

Language:C++Apache-2.0258500

k8s-dra-driver

Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes

Language:GoApache-2.023100

sgl-learning-materials

Materials for learning SGLang

MIT900

terway

CNI plugin for Alibaba Cloud VPC/ENI

Language:GoApache-2.054700

CLFuzz

Language:CGPL-3.0900

FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

Language:PythonNOASSERTION145100

TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Language:C++MIT12400

nnsmith

Automatic DNN generation for fuzzing and more

Language:PythonApache-2.011600

CasaOS

CasaOS - A simple, easy-to-use, elegant open-source Personal Cloud system.

Language:GoApache-2.02507900

gpumembench

A GPU benchmark suite for assessing on-chip GPU memory bandwidth

Language:C++GPL-2.09600

HAMi-core

HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container

Language:C7300

astra-sim

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

Language:C++MIT24200

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonBSD-2-Clause299300

VideoSys

VideoSys: An easy and efficient system for video generation

Language:PythonApache-2.0164800

rocm_bandwidth_test

Bandwidth test for ROCm

Language:C++MIT4500

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

15400