zhaoyang-star's repositories
test_opencl_image_object
use opencl image object for NHWC tensor
clpeak
A tool which profiles OpenCL devices to find their peak capacities
code-samples
Source code examples from the Parallel Forall Blog
flash-attention
Fast and memory-efficient exact attention
minimal-opencl-on-windows
Minimal OpenCL program on Windows
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
OpenCL-CLHPP
Khronos OpenCL-CLHPP
OpenCL-Headers
Khronos OpenCL-Headers
Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Paddle-Lite
Multi-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎)
Paddle-Lite-Demo
lib, demo, model, data
SNPE-UDL-TEST
UDL test for SNPE-1.31.0.522
tensorflow
An Open Source Machine Learning Framework for Everyone
test1
gitskills
threadpool
Fork of a nice threadpool library written by Ronald Kriemann which can be found here: http://www.kriemann.name/Ronald/projects/threadpool/index.en.htm
TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs