crapromer's starred repositories
rssbox-android
It is a rss reader for android. Based on Rust and Slint-ui.
cudnn-memo
Example code of cuDNN.
gpu-tensor-core
A set of programs testing CUDA Tensor Core performance
llama3-from-scratch
llama3 implementation one matrix multiplication at a time
CUDA-Learn-Notes
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
mlops-coding-course
Learn how to create, develop, and maintain a state-of-the-art MLOps code base
PyTorch-GAN
PyTorch implementations of Generative Adversarial Networks.
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
mlmodelscope
MLModelScope is an open source, extensible, and customizable platform to facilitate evaluation and measurement of ML models within AI pipelines.
cuda_gemm_benchmark
Base on gtest/benchmark, refer to https://github.com/Liu-xiandong/How_to_optimize_in_GPU
CUDA_Bench
CUDA GPU Benchmark
ABigSurveyOfLLMs
A collection of 150+ surveys on LLMs
Fast-TransX
An Efficient implementation of TransE and its extended models for Knowledge Representation Learning
Chinese-LLaMA-Alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)