Hiki's starred repositories
tensor2tensor
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
transfer.sh
Easy and fast file sharing from the command-line.
flash-attention
Fast and memory-efficient exact attention
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
annotated-transformer
An annotated implementation of the Transformer paper.
speedscope
🔬 A fast, interactive web-based viewer for performance profiles.
lolcommits
:camera: git-based selfies for software developers
transformer
Transformer: PyTorch Implementation of "Attention Is All You Need"
Awesome-System-for-Machine-Learning
A curated list of research in machine learning systems (MLSys). Paper notes are also provided.
CUDA-Learn-Notes
🎉 CUDA Learn Notes with PyTorch: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc.
resource-stream
CUDA related news and material links
Loser-HomeWork
卢瑟们的作业展示,答案讲解,以及一些C++知识
HPC-Learning-Notes
高性能计算相关知识学习笔记,包含学习笔记和相关知识的代码demo,在持续完善中。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!
CUDA-Optimization-Guide
Xiao's CUDA Optimization Guide [Active Adding New Contents]
cuda_learning
learning how CUDA works
CUDA-From-Correctness-To-Performance-Code
Codes & examples for "CUDA - From Correctness to Performance"
getdataset-from-hobby.lkszj.info
对于hobby.lkszj.info网站的数据爬取
cs229s-nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.