Hongwei Chen (hwchen2017)

hwchen2017

Geek Repo

Company:Northeastern University

Github PK Tool:Github PK Tool

Hongwei Chen's repositories

neural_network_quantum_state

Neural Network Quantum State

Language:Jupyter NotebookStargazers:6Issues:2Issues:0

ising-model-gpu

Accelerating Monte Carlo simulations of 2D Ising Model using Nvidia GPU

Language:CudaStargazers:3Issues:2Issues:0

Lanczos_Neural_Network_Quantum_State

Supporting code for "Systematic improvement of neural network quantum states using Lanczos (NeurIPS 2022)""

Language:C++Stargazers:2Issues:3Issues:0

Optimize_DGEMM_on_Intel_CPU

Implementations of DGEMM algorithm using different tricks to optimize the performance.

Language:CStargazers:2Issues:2Issues:0
Language:CStargazers:1Issues:0Issues:0

Awesome-System-for-Machine-Learning

A curated list of research in machine learning systems (MLSys). Paper notes are also provided.

License:MITStargazers:0Issues:0Issues:0

Optimize_SGEMM_on_Nvidia_GPU

Implementations of SGEMM algorithm on Nvidia GPU using different tricks to optimize the performance.

Language:CudaStargazers:0Issues:2Issues:0

resnet_food101_cifar10_pytorch

ResNet50 Implementation for Food101 and ResNet9 model for CIFAR10 in Pytorch

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

License:MITStargazers:0Issues:0Issues:0
Language:C++Stargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:0Issues:1Issues:0

DeepLearningExamples

Deep Learning Examples

Stargazers:0Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

License:BSD-3-ClauseStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

License:MITStargazers:0Issues:0Issues:0

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

Linear-Algebra-and-Learning-from-Data

Solutions to the problems in the book: Linear Algebra and Learning from Data by Gilbert Strang, MIT

Stargazers:0Issues:0Issues:0

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

numpy-ml

Machine learning, in numpy

Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0

oneDNN

oneAPI Deep Neural Network Library (oneDNN)

Language:C++License:Apache-2.0Stargazers:0Issues:1Issues:0

oneMKL

oneAPI Math Kernel Library (oneMKL) Interfaces

Language:C++License:Apache-2.0Stargazers:0Issues:1Issues:0
License:MITStargazers:0Issues:0Issues:0
Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:0Issues:0

TheArtofHPC_pdfs

All pdfs of Victor Eijkhout's Art of HPC books and courses

Stargazers:0Issues:0Issues:0

tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Stargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0