vuiseng9

User data from Github https://github.com/vuiseng9

followers

following

stars

@Intel

VS (Vui Seng Chua)'s repositories

nncf

PyTorch*-based Neural Network Compression Framework for enhanced OpenVINO™ inference

Language:PythonApache-2.0000

bench-softmax

Language:Python010

cats

Language:Python000

data-parallel-CPP

Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).

Language:CMakeNOASSERTION000

dejavu-lm

Language:Python000

EAGLE

[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Language:PythonApache-2.0000

ipex

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Language:PythonApache-2.0000

ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc

Language:PythonApache-2.0000

llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonMIT000

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonMIT000

mlperf-inference

Reference implementations of MLPerf™ inference benchmarks

Language:PythonApache-2.0000

mlperf-v3.0-intel

This repository contains the results and code for the MLPerf™ Inference v3.0 benchmark.

Apache-2.0000

mlperf-v3.1-intel

This repository contains the results and code for the MLPerf™ Inference v3.1 benchmark.

Apache-2.0000

mm_amx

matmul using AMX instructions

000

oneAPI-samples

Samples for Intel® oneAPI Toolkits

MIT000

openvino.genai

Language:PythonApache-2.0000

optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Language:PythonApache-2.0000

optimum-intel

Accelerate inference of 🤗 Transformers with Intel optimization tools

Language:PythonApache-2.0000

ov-llm-tld

Language:Python000

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:CMIT000

sd-perf

quick script to profile stable diffusion performance

Language:Python020

SparseFinetuning

Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry

Language:PythonApache-2.0000

Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Language:PythonApache-2.0000

speculative-sampling

Simple implementation of Speculative Sampling in NumPy for GPT-2.

Language:Python000

SqueezeLLM

SqueezeLLM: Dense-and-Sparse Quantization

Language:PythonMIT000

torch-custom-linear

custom implementation of linear

Language:Python020

transformers

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Language:PythonApache-2.0000

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000

wanda

A simple and effective LLM pruning approach.

Language:PythonMIT000