wangkuiyi

User data from Github https://github.com/wangkuiyi

followers

following

stars

Facebook

San Francisco Bay Area, CA

https://www.linkedin.com/in/yidewang

Organizations

elasticdl

Yi Wang's repositories

gotorch

A Go idiomatic binding to the C++ core of PyTorch

Language:GoMIT338 15 68

wangkuiyi.github.io

Language:HTML9 20

alpaca.cpp-ios

Locally run an Instruction-Tuned Chat-Style LLM

Language:CMIT100

Adv360-Pro-ZMK

Production repository for the all-new Advantage360 Professional using ZMK engine

Language:MakefileMIT000

axlearn-internals

Language:Python000

cpuinfo

CPU INFOrmation library (x86/x86-64/ARM/ARM64, Linux/Windows/Android/macOS/iOS)

Language:CBSD-2-Clause000

iree

👻

Language:C++Apache-2.0000

iree-for-apple-platforms

This project builds the IREE compiler for macOS and the IREE runtime for macOS, iOS, watchOS, and tvOS

Language:Shell02 3

iree-jax

Language:Jupyter NotebookApache-2.0000

jax-triton

jax-triton contains integrations between JAX and OpenAI Triton

Language:PythonApache-2.0000

JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Language:PythonApache-2.0000

lam2s

lam2s = Lean And Mean LAnguagle Model Serving

000

lamma_openxla

Language:C++010

learning-apple-programming-from-scratch

Language:Objective-C010

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookApache-2.0000

ml-recurrent-drafter

Language:PythonApache-2.0000

ml_collections

ML Collections is a library of Python Collections designed for ML use cases.

Language:PythonApache-2.0000

mlx

MLX: An array framework for Apple silicon

Language:C++MIT000

mlx-examples

Examples in the MLX framework

Language:PythonMIT000

mlx-lm

Run LLMs with MLX

MIT000

practice-scratch-extension

Language:JavaScriptApache-2.0000

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:C++NOASSERTION010

pytorch_memlab

Profiling and inspecting memory in pytorch

Language:PythonMIT000

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Language:C++Apache-2.0000

sglang

SGLang is a fast serving framework for large language models and vision language models.

Apache-2.0000

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0000

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonApache-2.0000

torchft

PyTorch per step fault tolerance (actively under development)

NOASSERTION000

xgrammar

Fast, Flexible and Portable Structured Generation

Apache-2.0000

xlt

Language:Python000