Zhuobin Huang (zobinHuang)

zobinHuang

Geek Repo

Company:University of Electronic Science and Technology of China

Location:China

Home Page:https://zobinhuang.github.io

Github PK Tool:Github PK Tool

Zhuobin Huang's starred repositories

Self-Hosting-Guide

Self-Hosting Guide. Learn all about locally hosting (on premises & private web servers) and managing software applications by yourself or your organization. Including Cloud, LLMs, WireGuard, Automation, Home Assistant, and Networking.

Language:DockerfileStargazers:10509Issues:0Issues:0

generative-recommenders

Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Language:PythonLicense:Apache-2.0Stargazers:656Issues:0Issues:0

pyo3

Rust bindings for the Python interpreter

Language:RustLicense:Apache-2.0Stargazers:12008Issues:0Issues:0

cudaparsers

Parsers for CUDA binary files

Language:RustLicense:Apache-2.0Stargazers:21Issues:0Issues:0

ecapture

Capturing SSL/TLS plaintext without a CA certificate using eBPF. Supported on Linux/Android kernels for amd64/arm64.

Language:CLicense:Apache-2.0Stargazers:12723Issues:0Issues:0

Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

License:Apache-2.0Stargazers:3447Issues:0Issues:0

quiet-star

Code for Quiet-STaR

Language:PythonLicense:Apache-2.0Stargazers:519Issues:0Issues:0

g1

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Language:PythonLicense:MITStargazers:3060Issues:0Issues:0
Language:PythonLicense:MITStargazers:295Issues:0Issues:0

xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

Language:C++License:Apache-2.0Stargazers:2585Issues:0Issues:0

k8s-dra-driver

Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes

Language:GoLicense:Apache-2.0Stargazers:231Issues:0Issues:0

sgl-learning-materials

Materials for learning SGLang

License:MITStargazers:9Issues:0Issues:0

terway

CNI plugin for Alibaba Cloud VPC/ENI

Language:GoLicense:Apache-2.0Stargazers:547Issues:0Issues:0
Language:CLicense:GPL-3.0Stargazers:9Issues:0Issues:0

FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

Language:PythonLicense:NOASSERTIONStargazers:1451Issues:0Issues:0

TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Language:C++License:MITStargazers:124Issues:0Issues:0

nnsmith

Automatic DNN generation for fuzzing and more

Language:PythonLicense:Apache-2.0Stargazers:116Issues:0Issues:0

CasaOS

CasaOS - A simple, easy-to-use, elegant open-source Personal Cloud system.

Language:GoLicense:Apache-2.0Stargazers:25079Issues:0Issues:0

gpumembench

A GPU benchmark suite for assessing on-chip GPU memory bandwidth

Language:C++License:GPL-2.0Stargazers:96Issues:0Issues:0

HAMi-core

HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container

Language:CStargazers:73Issues:0Issues:0

astra-sim

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

Language:C++License:MITStargazers:242Issues:0Issues:0

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonLicense:BSD-2-ClauseStargazers:2993Issues:0Issues:0

VideoSys

VideoSys: An easy and efficient system for video generation

Language:PythonLicense:Apache-2.0Stargazers:1648Issues:0Issues:0

rocm_bandwidth_test

Bandwidth test for ROCm

Language:C++License:MITStargazers:45Issues:0Issues:0

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

Stargazers:154Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:16Issues:0Issues:0

torchdynamo

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Language:PythonLicense:BSD-3-ClauseStargazers:1000Issues:0Issues:0

MIT-6.5940

All Homeworks for TinyML and Efficient Deep Learning Computing 6.5940 • Fall • 2023 • https://efficientml.ai

Language:Jupyter NotebookStargazers:112Issues:0Issues:0

citus

Distributed PostgreSQL as an extension

Language:CLicense:AGPL-3.0Stargazers:10398Issues:0Issues:0

MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonLicense:MITStargazers:702Issues:0Issues:0