VS (Vui Seng Chua)'s repositories
nncf
PyTorch*-based Neural Network Compression Framework for enhanced OpenVINO™ inference
data-parallel-CPP
Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
EAGLE
[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
ipex
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc
llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
lm-evaluation-harness
A framework for few-shot evaluation of language models.
mlperf-inference
Reference implementations of MLPerf™ inference benchmarks
mlperf-v3.0-intel
This repository contains the results and code for the MLPerf™ Inference v3.0 benchmark.
mlperf-v3.1-intel
This repository contains the results and code for the MLPerf™ Inference v3.1 benchmark.
mm_amx
matmul using AMX instructions
oneAPI-samples
Samples for Intel® oneAPI Toolkits
optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
optimum-intel
Accelerate inference of 🤗 Transformers with Intel optimization tools
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
SqueezeLLM
SqueezeLLM: Dense-and-Sparse Quantization
torch-custom-linear
custom implementation of linear
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
trl
Train transformer language models with reinforcement learning.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
wanda
A simple and effective LLM pruning approach.