ziyu huang (Arsmart123)

Arsmart123

Geek Repo

Github PK Tool:Github PK Tool

ziyu huang's starred repositories

Benchmark_SpGEMM_using_CSR

CSR-based SpGEMM on nVidia and AMD GPUs

Language:C++License:MITStargazers:45Issues:0Issues:0

CUDA_gemm

A simple high performance CUDA GEMM implementation.

Language:CudaStargazers:308Issues:0Issues:0

MegEngine

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

Language:C++License:Apache-2.0Stargazers:4748Issues:0Issues:0

RoIAlign.pytorch

RoIAlign & crop_and_resize for PyTorch

Language:C++Stargazers:553Issues:0Issues:0

BERT-pytorch

Google AI 2018 BERT pytorch implementation

Language:PythonLicense:Apache-2.0Stargazers:6134Issues:0Issues:0

YHs_Sample

Yinghan's Code Sample

Language:CudaLicense:GPL-3.0Stargazers:266Issues:0Issues:0

IT5007_Project_Spark-Tok

This is the repository containing souce code of our IT5007 Project - Spark Tok

Language:JavaScriptStargazers:1Issues:0Issues:0

maxas

Assembler for NVIDIA Maxwell architecture

Language:SassLicense:MITStargazers:937Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:34447Issues:0Issues:0

ParsiMoNe

Parallel Construction of Module Networks

Language:C++Stargazers:5Issues:0Issues:0

acrotensor

A C++ library for computing large scale tensor contractions.

Language:C++License:MITStargazers:36Issues:0Issues:0

ttpy

Python implementation of the TT-Toolbox

Language:PythonLicense:MITStargazers:235Issues:0Issues:0

how-to-optimize-gemm

row-major matmul optimization

Language:C++License:GPL-3.0Stargazers:580Issues:0Issues:0

code-samples

Source code examples from the Parallel Forall Blog

Language:HTMLLicense:BSD-3-ClauseStargazers:1220Issues:0Issues:0

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++License:Apache-2.0Stargazers:5728Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:5218Issues:0Issues:0

MSplitGEMM

Large matrix multiplication in CUDA

Language:CudaStargazers:14Issues:0Issues:0

SGEMM-Implementation-and-Optimization

:pencil: Some source code about matrix multiplication implementation on CUDA

Language:CudaStargazers:36Issues:0Issues:0

matrix-cuda

matrix multiplication in CUDA

Language:CudaLicense:MITStargazers:113Issues:0Issues:0

optimizing-matrix-multiplication-examples

Here's optimizing matrix multiplication examples.

Language:C++Stargazers:2Issues:0Issues:0

NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

Language:CudaStargazers:199Issues:0Issues:0

CUDA-Programming-with-Python

关于书籍CUDA Programming使用了pycuda模块的Python版本的示例代码

Language:PythonLicense:GPL-3.0Stargazers:227Issues:0Issues:0

CUDA-Programming

Sample codes for my CUDA programming book

Language:CudaLicense:GPL-3.0Stargazers:1496Issues:0Issues:0

extension-cpp

C++ extensions in PyTorch

Language:PythonStargazers:978Issues:0Issues:0

pytorch-extension

an example of a CUDA extension for PyTorch using CuPy which computes the Hadamard product of two tensors

Language:PythonLicense:GPL-3.0Stargazers:117Issues:0Issues:0

cuda_accelerate

使用c++以及cuda加速神经网络样例(实现矩阵加法和矩阵乘法)

Language:PythonStargazers:52Issues:0Issues:0
Language:CudaStargazers:2068Issues:0Issues:0

tensorly-notebooks

Tensor methods in Python with TensorLy

Language:Jupyter NotebookStargazers:423Issues:0Issues:0