Jianfeng Yan (j4yan)

j4yan

Geek Repo

Github PK Tool:Github PK Tool

Jianfeng Yan's repositories

amgcl

C++ library for solving large sparse linear systems with algebraic multigrid method

Language:C++License:MITStargazers:0Issues:0Issues:0

aviation2017_talk

aviation2017 talk

Language:TeXStargazers:0Issues:0Issues:0

blislab

BLISlab: A Sandbox for Optimizing GEMM

Language:CStargazers:0Issues:0Issues:0

cmake-examples

Useful CMake Examples

Language:CMakeLicense:MITStargazers:0Issues:0Issues:0

COSMA

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

Language:C++License:BSD-3-ClauseStargazers:0Issues:0Issues:0

cpu_gemm_opt

how to design cpu gemm on x86 with avx256, that can beat openblas.

Language:C++License:MITStargazers:0Issues:0Issues:0
Language:CStargazers:0Issues:0Issues:0

DeepLearningSystem

Deep Learning System core principles introduction.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

HIP-Performance-Optmization-on-VEGA64

14 basic topics for VEGA64 performance optmization

Language:C++Stargazers:0Issues:0Issues:0
Language:CStargazers:0Issues:0Issues:0

How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the program on the GPU in detail. The reduce optimization has been completed. The optimization of GEMM has completed the CUDA C code. The assembler is currently being used to tune the code, and the code will be issued later.

License:Apache-2.0Stargazers:0Issues:0Issues:0

howto

Build recipies and other howtos

Stargazers:0Issues:0Issues:0

jacobi-svd

Numerical experiments on Jacobi SVD algorithm

Language:MATLABStargazers:0Issues:0Issues:0

llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

License:Apache-2.0Stargazers:0Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Stargazers:0Issues:0Issues:0
Language:TeXStargazers:0Issues:0Issues:0

Modern-CPP-Programming

Modern C++ Programming Course (C++11/14/17/20)

Stargazers:0Issues:0Issues:0

Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually.

License:GPL-3.0Stargazers:0Issues:0Issues:0

Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

Optimizing SGEMM kernel functions on RTX 2080 Super to a close-to-cuBLAS performance.

Language:CudaLicense:GPL-3.0Stargazers:0Issues:0Issues:0

Optimizing-SGEMV-on-NVIDIA-GPUs

An implementation of SGEMV with performance comparable to cuBLAS.

Language:CudaLicense:GPL-3.0Stargazers:0Issues:0Issues:0

siam_cse2017_poster

poster_SAT_for_2nd_PDE

Language:TeXStargazers:0Issues:0Issues:0

ulmBLAS

ulmBLAS

Language:FortranLicense:NOASSERTIONStargazers:0Issues:0Issues:0

wmma_extension

An extension library of WMMA API (Tensor Core API)

License:MITStargazers:0Issues:0Issues:0