jundaf2

followers

following

stars

jundaf's repositories

INT8-Flash-Attention-FMHA-Quantization

Language:Cuda136 5 4

eigenMHA

Forward and backward Attention DNN operators implementationed by LibTorch, cuDNN, and Eigen.

Language:C++2500

CUDA-INT8-GEMM

CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API

Language:Cuda16 3 1

DNN-2d-FDTD

Language:Python900

PDE-Net-FDTD

Language:Python900

dnn-test-framework

DNN unit test framework

Language:C++700

RNN-1d-FDTD

Language:Python700

GPU-Tensor-Permute

permute sequence data on GPU with high bandwidth

Language:C++600

Flash-LightSeq

Language:C++200

adaptive-filtering-algorithms

Adaptive Algorithms

Language:MATLAB100

cutlass-b2bgemm

an extension to the cutlass half-precision b2b gemm example

Language:C++100

cutlass-kernel-volta-gemm

volta fp16 gemm kernel

Language:Cuda100

eigenDNN

100

FelixFu520-README

A pupil in the computer world.(Felix Fu)

100

FETD-MFEM

A simple Finite element time domain example built with MFEM

Language:C++100

DNN-Discrete-Hilbert-Transform

Language:Python000

Awesome-Machine-Learning-System-Papers

Apache-2.0000

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

NOASSERTION000

cuda_hook

Hooked CUDA-related dynamic libraries by using automated code generation tools.

000

EasyChatGPT-API

用python和flask简单实现调用chatGPT的API

000

EasyWeChatBot

1分钟用ChatGPT API实现微信聊天机器人

000

flash-attention

Fast and memory-efficient exact attention

BSD-3-Clause000

gpu-gym

a toy used for keeping all gpus on a machine busy using nccl

000

GPU-Internal-Sorting

Language:C++000

GPU-Philox

cuda philox in a single kernel (easily used in fusion)

Language:CMake000

Heterogeneous-GPUs

Heterogeneous Nvidia (CUDA) and Intel (OpenCL) GPU Programming

Language:C++000

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

NOASSERTION000

matxscript

A high-performance, extensible Python AOT compiler.

Apache-2.0000

openCL-devQuery-vecAdd

Language:C++000

UIUC-ECE408-21SP-Project-CNN

Language:Cuda000