zhaoyang-star

followers

following

stars

Beijing

zhaoyang-star's repositories

test_opencl_image_object

use opencl image object for NHWC tensor

Language:C++1 20

Anakin

Language:C++Apache-2.0000

clbenchmark

Language:C++Unlicense000

clpeak

A tool which profiles OpenCL devices to find their peak capacities

Language:C++Unlicense000

code-samples

Source code examples from the Parallel Forall Blog

Language:HTMLBSD-3-Clause000

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause000

minimal-opencl-on-windows

Minimal OpenCL program on Windows

Language:CMIT000

MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

Language:C++000

nvidia-opencl-examples

Language:C++000

OpenCL-CLHPP

Khronos OpenCL-CLHPP

Language:C++Apache-2.0000

OpenCL-Headers

Khronos OpenCL-Headers

Language:CApache-2.0000

Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

Apache-2.0000

Paddle-Lite

Multi-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎）

Language:C++Apache-2.0000

Paddle-Lite-Demo

lib, demo, model, data

Apache-2.0000

SNPE-UDL-TEST

UDL test for SNPE-1.31.0.522

000

tensorflow

An Open Source Machine Learning Framework for Everyone

Language:C++Apache-2.0000

test1

gitskills

000

threadpool

Fork of a nice threadpool library written by Ronald Kriemann which can be found here: http://www.kriemann.name/Ronald/projects/threadpool/index.en.htm

Language:C++NOASSERTION010

TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications

Language:PythonMIT000

tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Language:PythonApache-2.0010

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000

zhaoyang-star.github.io

010