irasin

followers

following

stars

Sin's starred repositories

rich

Rich is a Python library for rich text and beautiful formatting in the terminal.

Language:PythonMIT49150 537 1320

CS-Base

图解计算机网络、操作系统、计算机组成、数据库，共 1000 张图 + 50 万字，破除晦涩难懂的计算机基础知识，让天下没有难懂的八股文！🚀 在线阅读：https://xiaolincoding.com

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilog6991 68 23

llm-applications

A comprehensive guide to building RAG-based LLM applications for production.

Language:Jupyter NotebookCC-BY-4.01684 17 12

cpp

C++ Tip Of The Week

Language:Python1553 141 4

ThunderKittens

Tile primitives for speedy kernels

Language:CudaMIT1529 24 26

CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Language:CudaGPL-3.01255 14 6

ut

C++20 μ(micro)/Unit Testing Framework

Language:C++BSL-1.01252 29 155

gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.

Language:C++NOASSERTION1101 46 169

random

Random for modern C++ with convenient API

Language:C++MIT905 33 16

Awesome-LLM-RAG-Application

the resources about the application based on LLM with RAG pattern

clang-tutor

A collection of out-of-tree Clang plugins for teaching and learning

Language:C++Unlicense695 20 17

Learn-LLVM-12

Learn LLVM 12, published by Packt

Language:C++MIT474 12 17

qwen-vllm

通义千问VLLM推理部署DEMO

Language:Python421 5 8

CUDA_gemm

A simple high performance CUDA GEMM implementation.

Language:Cuda327 5 3

nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.

Language:C++Apache-2.0300 13 16

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++Apache-2.0280 8 11

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaMIT278 4 12

RAG_langchain

一个基于langchain实现RAG的简单示例

Language:Jupyter Notebook266 2 1

gpu-benches

collection of benchmarks to measure basic GPU capabilities

Language:Jupyter NotebookGPL-3.0256 8 11

portBLAS

An implementation of BLAS using the SYCL open standard.

Language:C++Apache-2.0254 24 47

CppProjectTemplate

C++ project template with unit-tests, documentation, ci-testing and workflows.

Language:CMakeMIT217 15 4

llvm-tutorial

llvm-tutorial文档，翻译以及代码仓库

Language:C++Apache-2.0155 6 1

TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Language:C++MIT139 3 58

wmma_tensorcore_sample

Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)

Language:CudaMIT110 4 2

cuda-tensorcore-hgemm

Language:Cuda101 50

Hands-on-GEMM

Language:CudaGPL-3.092 2 3

wmma_extension

An extension library of WMMA API (Tensor Core API)

Language:CudaMIT81 8 4

CUDA-PPT

Apache-2.078 20

online-softmax

Benchmark code for the "Online normalizer calculation for softmax" paper

Language:CudaBSD-3-Clause55 60