airMeng

followers

following

stars

Intel

Shanghai

https://read.cv/hym

Organizations

RunoobHelpsRunoob

Meng, Hengyu's starred repositories

intel-npu-acceleration-library

Intel® NPU Acceleration Library

Language:PythonApache-2.033300

neural-speed

An innovative library for efficient LLM inference via low-bit quantization

Language:C++Apache-2.028700

xetla

Language:C++Apache-2.05000

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Language:Cuda108400

intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Language:PythonApache-2.0200300

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language:PythonApache-2.0202900

x86-64-minimal-JIT-compiler-Cpp

Writing a minimal x86-64 JIT compiler in C++

Language:C++GPL-3.09300

optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

Language:Jupyter NotebookApache-2.034200

intel-extension-for-tensorflow

Intel® Extension for TensorFlow*

Language:C++NOASSERTION30600

notebooks

Language:Jupyter NotebookApache-2.017400

mlir-hello

MLIR Sample dialect

9000

mlirx

MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com

3700

pasl

Parallel Algorithm Scheduling Library

Language:C++Apache-2.010000

sgemm_hsw

This is an implementation of sgemm_kernel on L1d cache.

Language:AssemblyGPL-3.021400

bril

an educational compiler intermediate representation

Language:RustMIT47300

ppl.nn

A primitive library for neural network

Language:C++Apache-2.0123600

awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

sparsednn

Fast sparse deep learning on CPUs

Language:PythonApache-2.05100

maxas

Assembler for NVIDIA Maxwell architecture

Language:SassMIT92100

oneAPI-samples

Samples for Intel® oneAPI Toolkits

Language:C++MIT86700

easy-just-in-time

LLVM Optimization to extract a function, embedded in its intermediate representation in the binary, and execute it using the LLVM Just-In-Time compiler.

Language:C++BSD-3-Clause50500

onnx2pytorch

Transform ONNX model to PyTorch representation

Language:PythonApache-2.029700

GEMM_Optimization

Optimize GEMM. With AVX512 and AVX512-BF16, 800x improvement.

Language:C++1400

dpcpp-tutorial

Intel Data Parallel C++ (and SYCL 2020) Tutorial.

Language:C++MIT8900

lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

Language:C++NOASSERTION311100

onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

Language:C++Apache-2.070100

ipex_verbose

ipex verbose toolkit

Language:Python200

PySparseConvNet

Python Framework for sparse neural networks

Language:Cuda1900

mtensor

a c++/cuda template library for tensor lazy evaluation

Language:C++NOASSERTION15900

MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors

Language:PythonNOASSERTION232900