Sihan Chen's starred repositories
prometheus-fastapi-instrumentator
Instrument your FastAPI with Prometheus metrics.
GenAIComps
GenAI components at micro-service level; GenAI service composer to create mega-service
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
gemma_pytorch
The official PyTorch implementation of Google's Gemma models
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
FasterTransformer
Transformer related optimization, including BERT, GPT
noisereduce
Noise reduction in python using spectral gating (speech, bioacoustics, audio, time-domain signals)
text-to-text-transfer-transformer
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
tensorflow-onnx
Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
optimization-manual
Contains the source code examples described in the "Intel® 64 and IA-32 Architectures Optimization Reference Manual"
Awesome-Pruning
A curated list of neural network pruning resources.
awesome-ml-model-compression
Awesome machine learning model compression research papers, quantization, tools, and learning material.
oneAPI_course
oneAPI - Data Parallel C++ course for students
streamingbook
Code snippets from the Streaming Systems book (streamingbook.net).
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime