Siyuan Li (cowyyy)

cowyyy

Geek Repo

Company:UNSW

Github PK Tool:Github PK Tool

Siyuan Li's starred repositories

tensorflow

An Open Source Machine Learning Framework for Everyone

Language:C++License:Apache-2.0Stargazers:185779Issues:7599Issues:39809

chibicc

A small C compiler

8cc

A Small C Compiler

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++License:Apache-2.0Stargazers:5800Issues:62Issues:625

SimpleNES

An NES emulator in C++

Language:C++License:GPL-3.0Stargazers:4826Issues:97Issues:38

warp-ctc

Fast parallel CTC.

Language:CudaLicense:Apache-2.0Stargazers:4066Issues:355Issues:130

tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

Language:C++License:NOASSERTIONStargazers:3691Issues:49Issues:386

micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Language:PythonLicense:MITStargazers:2212Issues:40Issues:110

tvm_mlir_learn

compiler learning resources collect.

xbyak

a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header

Language:C++License:BSD-3-ClauseStargazers:2032Issues:114Issues:94

awesome-ocr

A curated list of promising OCR resources

caffe_ocr

主流ocr算法研究实验性的项目,目前实现了CNN+BLSTM+CTC架构

gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.

Language:C++License:NOASSERTIONStargazers:1092Issues:46Issues:169

clstm

A small C++ implementation of LSTM networks, focused on OCR.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:819Issues:102Issues:97

How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

Language:CudaLicense:Apache-2.0Stargazers:808Issues:13Issues:15

dabnn

dabnn is an accelerated binary neural networks inference framework for mobile platform

Language:C++License:NOASSERTIONStargazers:768Issues:38Issues:29

libonnx

A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.

onnc

Open Neural Network Compiler

Language:C++License:BSD-3-ClauseStargazers:514Issues:57Issues:68

MegCC

MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器

Language:C++License:Apache-2.0Stargazers:468Issues:19Issues:22

EasyCNN

easy convolution neural network

SDBI

Simple Dynamic Batching Inference

ez_ISP

This is a easy ISP (ez_ISP) for RAW to RGB conversion.

ClipperDocCN

The documention of ClipperLib in Chinese

InsNet

InsNet Runs Instance-dependent Neural Networks with Padding-free Dynamic Batching.

Language:C++Stargazers:66Issues:3Issues:0
Language:ScalaLicense:MITStargazers:65Issues:2Issues:0

yolo_quantization

Based of paper "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

ncnn_breakdown

A breakdown of NCNN

Language:C++License:MITStargazers:45Issues:2Issues:2

llvm-project-fork

Fork of LLVM Project containing a Colossus IPU backend implementation

popart

Poplar Advanced Runtime for the IPU

Language:C++License:NOASSERTIONStargazers:5Issues:2Issues:6