Hongwei Chen's repositories
neural_network_quantum_state
Neural Network Quantum State
ising-model-gpu
Accelerating Monte Carlo simulations of 2D Ising Model using Nvidia GPU
Lanczos_Neural_Network_Quantum_State
Supporting code for "Systematic improvement of neural network quantum states using Lanczos (NeurIPS 2022)""
Optimize_DGEMM_on_Intel_CPU
Implementations of DGEMM algorithm using different tricks to optimize the performance.
Awesome-System-for-Machine-Learning
A curated list of research in machine learning systems (MLSys). Paper notes are also provided.
Optimize_SGEMM_on_Nvidia_GPU
Implementations of SGEMM algorithm on Nvidia GPU using different tricks to optimize the performance.
resnet_food101_cifar10_pytorch
ResNet50 Implementation for Food101 and ResNet9 model for CIFAR10 in Pytorch
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
DeepLearningExamples
Deep Learning Examples
flash-attention
Fast and memory-efficient exact attention
flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Linear-Algebra-and-Learning-from-Data
Solutions to the problems in the book: Linear Algebra and Learning from Data by Gilbert Strang, MIT
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
numpy-ml
Machine learning, in numpy
TheArtofHPC_pdfs
All pdfs of Victor Eijkhout's Art of HPC books and courses
tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass