Vui Seng Chua's repositories
nncf
PyTorch*-based Neural Network Compression Framework for enhanced OpenVINO™ inference
AMX-TMUL-Code-Samples
Code samples related to Intel(R) AMX
Diff-Pruning
Structural Pruning for Diffusion Models
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
DLAI-LangChain-LLM-App
In LangChain for LLM Application Development, you will gain essential skills in expanding the use cases and capabilities of language models in application development using the LangChain framework.
EAGLE
[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
hf-peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
ipex
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc
llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
meta-sam
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
mlperf-inference
Reference implementations of MLPerf™ inference benchmarks
mlperf-v3.0-intel
This repository contains the results and code for the MLPerf™ Inference v3.0 benchmark.
mlperf-v3.1-intel
This repository contains the results and code for the MLPerf™ Inference v3.1 benchmark.
optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
optimum-intel
Accelerate inference of 🤗 Transformers with Intel optimization tools
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
smoothquant
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
SqueezeLLM
SqueezeLLM: Dense-and-Sparse Quantization
Teaching-Intel-Intrinsics-for-SIMD-Parallelism
Teaching Vectorization and SIMD using Intel Intrinsics in a Computer Organization and Architecture class
torchinfo
View model summaries in PyTorch!
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
trl
Train transformer language models with reinforcement learning.