Deepware

Deepware's repositories

Sextans

An FPGA accelerator for general-purpose Sparse-Matrix Dense-Matrix Multiplication (SpMM).

MIT000

SparseP

SparseP is the first open-source Sparse Matrix Vector Multiplication (SpMV) software package for real-world Processing-In-Memory (PIM) architectures. [https://arxiv.org/abs/2201.05072]

MIT000

Serpens

An HBM FPGA based SpMV Accelerator

MIT000

trans-fat

An FPGA Accelerator for Transformer Inference (BERT)

000

This is a series of GPU optimization topics. Here we will introduce how to optimize the program on the GPU in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

Apache-2.0100

gemm_spmm

Hardware accelerator for pruned nertworks

GPL-2.0100

SEAsynth

A synthesize-able CNN accelerator based on systolic arrays 🌊

200

EdgeBERT

HW/SW co-design of sentence-level energy optimizations for latency-aware multi-task NLP inference

NOASSERTION000

Paddle-Lite

Multi-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎）

Apache-2.0000

SpinalHDL_CNN_Accelerator

CNN accelerator implemented with Spinal HDL

GPL-3.0000

dory

A tool to deploy Deep Neural Networks on PULP-based SoC's

Apache-2.0000

edge-ai

A curated list of resources for embedded AI

100

nemo

NEural Minimizer for pytOrch

Apache-2.0000

lenet5_hls

FPGA Accelerator for CNN using Vivado HLS

MIT000

quantlab

Apache-2.0000

quantlib

Apache-2.0000

neural-compressor

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision, sparsity, pruning, knowledge distillation, cross different deep learning frameworks to purse best inference performance.

Apache-2.0000

openvino_tensorflow

OpenVINO™ integration with TensorFlow

NOASSERTION000

bnna

bnn accelerator

000

SYCL-Building-Blocks

000

FPGA_AcceleratorWrapper

Accelerator wrapper with AXI3 DMA and AXI Lite for control

000

approximate-spmv-topk

Public repostory for the DAC 2021 paper "Scaling up HBM Efficiency of Top-K SpMV forApproximate Embedding Similarity on FPGAs"

MIT000

MVU

Neural Network accelerator powered by MVUs and RISC-V.

MIT000

PE-array-for-LeNet-accelerator-based-on-FPGA

This is a 4*5 PE array for LeNet accelerator based on FPGA.

000

Yolo-Fastest

:zap: Based on yolo's ultra-lightweight universal target detection algorithm, the calculation amount is only 250mflops, the ncnn model size is only 666kb, the Raspberry Pi 3b can run up to 15fps+, and the mobile terminal can run up to 178fps+

NOASSERTION000

XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

NOASSERTION000

neural-network-fpga

GPL-3.0000

mixed_signal_mmwave_edge_accelerator

NOASSERTION000

ara

The PULP Ara is a 64-bit Vector Unit, compatible with the RISC-V Vector Extension Version 0.10, working as a coprocessor to CORE-V's CVA6 core

NOASSERTION000

hci

Heterogeneous Cluster Interconnect to bind special-purpose HW accelerators with general-purpose cluster cores

NOASSERTION000

Deepware

deepware-ai

Deepware's repositories

Sextans

SparseP

Serpens

trans-fat

How_to_optimize_in_GPU

gemm_spmm

SEAsynth

EdgeBERT

Paddle-Lite

SpinalHDL_CNN_Accelerator

dory

edge-ai

nemo

lenet5_hls

quantlab

quantlib

neural-compressor

openvino_tensorflow

bnna

SYCL-Building-Blocks

FPGA_AcceleratorWrapper

approximate-spmv-topk

MVU

PE-array-for-LeNet-accelerator-based-on-FPGA

Yolo-Fastest

XNNPACK

neural-network-fpga

mixed_signal_mmwave_edge_accelerator

ara

hci