j4yan

0

followers

following

stars

Jianfeng Yan's repositories

amgcl

C++ library for solving large sparse linear systems with algebraic multigrid method

Language:C++MIT000

aviation2017_talk

aviation2017 talk

Language:TeX000

blislab

BLISlab: A Sandbox for Optimizing GEMM

Language:C000

cmake-examples

Useful CMake Examples

Language:CMakeMIT000

COSMA

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

Language:C++BSD-3-Clause000

cpu_gemm_opt

how to design cpu gemm on x86 with avx256, that can beat openblas.

Language:C++MIT000

csci6963_hw

Language:C000

DeepLearningSystem

Deep Learning System core principles introduction.

Language:Jupyter NotebookApache-2.0000

HIP-Performance-Optmization-on-VEGA64

14 basic topics for VEGA64 performance optmization

Language:C++000

how-to-optimize-gemm

Language:C000

How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the program on the GPU in detail. The reduce optimization has been completed. The optimization of GEMM has completed the CUDA C code. The assembler is currently being used to tune the code, and the code will be issued later.

Apache-2.0000

howto

Build recipies and other howtos

000

jacobi-svd

Numerical experiments on Jacobi SVD algorithm

Language:MATLAB000

llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Apache-2.0000

llm.c

LLM training in simple, raw C/CUDA

000

mane4280

Language:TeX000

Modern-CPP-Programming

Modern C++ Programming Course (C++11/14/17/20)

000

Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually.

GPL-3.0000

Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

Optimizing SGEMM kernel functions on RTX 2080 Super to a close-to-cuBLAS performance.

Language:CudaGPL-3.0000

Optimizing-SGEMV-on-NVIDIA-GPUs

An implementation of SGEMV with performance comparable to cuBLAS.

Language:CudaGPL-3.0000

siam_cse2017_poster

poster_SAT_for_2nd_PDE

Language:TeX000

ulmBLAS

ulmBLAS

Language:FortranNOASSERTION000

wmma_extension

An extension library of WMMA API (Tensor Core API)

MIT000