hwchen2017

Hongwei Chen's repositories

neural_network_quantum_state

Neural Network Quantum State

Language:Jupyter Notebook6 20

ising-model-gpu

Accelerating Monte Carlo simulations of 2D Ising Model using Nvidia GPU

Language:Cuda3 20

Lanczos_Neural_Network_Quantum_State

Supporting code for "Systematic improvement of neural network quantum states using Lanczos (NeurIPS 2022)""

Language:C++2 30

Optimize_DGEMM_on_Intel_CPU

Implementations of DGEMM algorithm using different tricks to optimize the performance.

Language:C2 20

ASD-kernel-fusion

Language:C100

Awesome-System-for-Machine-Learning

A curated list of research in machine learning systems (MLSys). Paper notes are also provided.

MIT000

Optimize_SGEMM_on_Nvidia_GPU

Implementations of SGEMM algorithm on Nvidia GPU using different tricks to optimize the performance.

Language:Cuda020

resnet_food101_cifar10_pytorch

ResNet50 Implementation for Food101 and ResNet9 model for CIFAR10 in Pytorch

Language:Jupyter Notebook000

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

MIT000

cute-gemm

Language:C++000

Cute-Gemm-Optimization

MIT000

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++NOASSERTION010

DeepLearningExamples

Deep Learning Examples

000

flash-attention

Fast and memory-efficient exact attention

BSD-3-Clause000

flash-attention-v100

000

flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

MIT000

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

000

how-to-optimize-gemm

000

Linear-Algebra-and-Learning-from-Data

Solutions to the problems in the book: Linear Algebra and Learning from Data by Gilbert Strang, MIT

000

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++Apache-2.0000

multi-GPU-comm-bench

000

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

BSD-3-Clause000

numpy-ml

Machine learning, in numpy

Language:PythonGPL-3.0000

oneDNN

oneAPI Deep Neural Network Library (oneDNN)

Language:C++Apache-2.0010

oneMKL

oneAPI Math Kernel Library (oneMKL) Interfaces

Language:C++Apache-2.0010

physics_codes_publications

MIT000

Quantum

Language:Jupyter NotebookNOASSERTION000

TheArtofHPC_pdfs

All pdfs of Victor Eijkhout's Art of HPC books and courses

000

tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

000

varbench

Language:PythonApache-2.0000