carlushuang

carlushuang

Geek Repo

Company:AMD

Location:shanghai

Github PK Tool:Github PK Tool


Organizations
ROCmSoftwarePlatform

carlushuang's repositories

cpu_gemm_opt

how to design cpu gemm on x86 with avx256, that can beat openblas.

Language:C++License:MITStargazers:60Issues:7Issues:1

gcnasm

amdgpu example code in hip/asm

Language:C++Stargazers:9Issues:3Issues:0

avx_flops

Benchmark cpu flops using avx instructions

Language:CStargazers:5Issues:4Issues:0

FFT_implement

fft/ifft, r2c/c2r, 2d_r2c/2d_c2r, convolve, correlation, tiling fft, srfft, pfa, radix-2/3/5

Language:C++Stargazers:3Issues:2Issues:0

deepcore_source_code

Subpart source code of of deepcore v0.7

Language:CStargazers:1Issues:2Issues:0
Language:CStargazers:1Issues:0Issues:0
Language:C++Stargazers:0Issues:3Issues:0

amdgpu-jit

test project for amdgpu codegen

Stargazers:0Issues:2Issues:0
Language:PythonStargazers:0Issues:0Issues:0

auto_gen

auto gen

Language:C++Stargazers:0Issues:2Issues:0

binutils-gdb

Unofficial mirror of sourceware binutils-gdb repository. Updated daily.

Language:CLicense:GPL-2.0Stargazers:0Issues:0Issues:0

CWBVH

An implementation of NVIDIA's paper "Efficient Incoherent Ray Traversal on GPUs Through Compressed Wide BVHs"

Stargazers:0Issues:0Issues:0

D3D12nBodyGravity_clang

D3D12nBodyGravity example with clang build

Language:CStargazers:0Issues:0Issues:0

HIP

HIP : Convert CUDA to Portable C++ Code

Language:C++License:NOASSERTIONStargazers:0Issues:2Issues:0

HIP-Examples

Examples for HIP

Stargazers:0Issues:0Issues:0

hipBLAS

ROCm BLAS marshalling library

Language:C++License:MITStargazers:0Issues:2Issues:0
Language:C++Stargazers:0Issues:0Issues:0
Language:C++Stargazers:0Issues:0Issues:0
Language:C++Stargazers:0Issues:2Issues:0

Mandelbrot-Set

mandelbrot set

Language:PythonStargazers:0Issues:2Issues:0

miopen-benchmark

benchmarking miopen

Language:C++License:BSD-3-ClauseStargazers:0Issues:3Issues:0

mlir

"Multi-Level Intermediate Representation" Compiler Infrastructure

License:Apache-2.0Stargazers:0Issues:0Issues:0

Paddle

PArallel Distributed Deep LEarning

Language:C++License:Apache-2.0Stargazers:0Issues:2Issues:0

rocBLAS

Next generation BLAS implementation for ROCm platform

Language:C++License:MITStargazers:0Issues:2Issues:0

rocm-recipes

Recipes for rocm

Language:CMakeStargazers:0Issues:0Issues:0

Tensile

Stretching GPU performance for GEMMs and tensor contractions.

Language:PythonLicense:MITStargazers:0Issues:3Issues:0

tsm2x-imp

Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA

Language:CudaLicense:MITStargazers:0Issues:2Issues:0
Language:PythonLicense:MITStargazers:0Issues:2Issues:0

xbyak

a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0