Beast code in Giters

yawen_Li's starred repositories

YHs_Sample

Yinghan's Code Sample

Language:CudaGPL-3.024300

llama.cpp

LLM inference in C/C++

Language:C++MIT5993500

NVIDIA-OpenCL-Samples

可编译的 nvidia opencl 官方实例代码，https://developer.nvidia.com/opencl

Language:C200

ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Language:PythonApache-2.0141500

mperf

mperf是一个面向移动/嵌入式平台的算子性能调优工具箱

Language:C++Apache-2.016600

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

Language:CudaApache-2.072000

workflow

C++ Parallel Computing and Asynchronous Networking Framework

Language:C++Apache-2.01258400

QualcommOpenCLSDKNote

The note of Qualcomm OpenCL SDK

Language:C++2200

sgemm_hsw

This is an implementation of sgemm_kernel on L1d cache.

Language:AssemblyGPL-3.021400

cpu-cache-test

cpu cache延迟实验

Language:C100

OpenCL-correlation-using-local-memory

Correlation demo in OpenCL that uses local memory.

Language:C100

memtestCL

OpenCL memory tester for GPUs

Language:C++NOASSERTION11800

libpag

The official rendering library for PAG (Portable Animated Graphics) files that renders After Effects animations natively across multiple platforms.

Language:C++NOASSERTION479000

shoc

The SHOC Benchmark Suite

Language:MakefileNOASSERTION23800

FFT

4400

ppl.nn

A primitive library for neural network

Language:C++Apache-2.0123600

ArmNeonOptimization

arm-neon

Language:C++8000

CUDA_gemm

A simple high performance CUDA GEMM implementation.

Language:Cuda29200

Cplusplus-Concurrency-In-Practice

A Detailed Cplusplus Concurrency Tutorial 《C++ 并发编程指南》

Language:C++MIT520400

mmcv

OpenMMLab Computer Vision Foundation

Language:PythonApache-2.0569400

mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

Language:PythonApache-2.0324100

mobilenet-ssd-snpe

mobilenet-ssd snpe demo

Language:C++3900

tinyml

Language:PythonMIT71600

TNN

TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.

Language:C++NOASSERTION430500

CPlusPlusThings

C++那些事

Language:C++3778700

a243845305