jundaf's repositories

eigenMHA

Forward and backward Attention DNN operators implementationed by LibTorch, cuDNN, and Eigen.

Language:C++Stargazers:25Issues:0Issues:0

CUDA-INT8-GEMM

CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API

Language:PythonStargazers:9Issues:0Issues:0
Language:PythonStargazers:9Issues:0Issues:0

dnn-test-framework

DNN unit test framework

Language:C++Stargazers:7Issues:0Issues:0
Language:PythonStargazers:7Issues:0Issues:0

GPU-Tensor-Permute

permute sequence data on GPU with high bandwidth

Language:C++Stargazers:6Issues:0Issues:0
Language:C++Stargazers:2Issues:0Issues:0
Language:MATLABStargazers:1Issues:0Issues:0

cutlass-b2bgemm

an extension to the cutlass half-precision b2b gemm example

Language:C++Stargazers:1Issues:0Issues:0

cutlass-kernel-volta-gemm

volta fp16 gemm kernel

Language:CudaStargazers:1Issues:0Issues:0
Stargazers:1Issues:0Issues:0

FelixFu520-README

A pupil in the computer world.(Felix Fu)

Stargazers:1Issues:0Issues:0

FETD-MFEM

A simple Finite element time domain example built with MFEM

Language:C++Stargazers:1Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

License:NOASSERTIONStargazers:0Issues:0Issues:0

cuda_hook

Hooked CUDA-related dynamic libraries by using automated code generation tools.

Stargazers:0Issues:0Issues:0

EasyChatGPT-API

用python和flask简单实现调用chatGPT的API

Stargazers:0Issues:0Issues:0

EasyWeChatBot

1分钟用ChatGPT API实现微信聊天机器人

Stargazers:0Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

gpu-gym

a toy used for keeping all gpus on a machine busy using nccl

Stargazers:0Issues:0Issues:0
Language:C++Stargazers:0Issues:0Issues:0

GPU-Philox

cuda philox in a single kernel (easily used in fusion)

Language:CMakeStargazers:0Issues:0Issues:0

Heterogeneous-GPUs

Heterogeneous Nvidia (CUDA) and Intel (OpenCL) GPU Programming

Language:C++Stargazers:0Issues:0Issues:0

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

License:NOASSERTIONStargazers:0Issues:0Issues:0

matxscript

A high-performance, extensible Python AOT compiler.

License:Apache-2.0Stargazers:0Issues:0Issues:0
Language:C++Stargazers:0Issues:0Issues:0
Language:CudaStargazers:0Issues:0Issues:0