Sin's starred repositories

rich

Rich is a Python library for rich text and beautiful formatting in the terminal.

Language:PythonLicense:MITStargazers:49150Issues:537Issues:1320

CS-Base

图解计算机网络、操作系统、计算机组成、数据库,共 1000 张图 + 50 万字,破除晦涩难懂的计算机基础知识,让天下没有难懂的八股文!🚀 在线阅读:https://xiaolincoding.com

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilogStargazers:6991Issues:68Issues:23

llm-applications

A comprehensive guide to building RAG-based LLM applications for production.

Language:Jupyter NotebookLicense:CC-BY-4.0Stargazers:1684Issues:17Issues:12

cpp

C++ Tip Of The Week

ThunderKittens

Tile primitives for speedy kernels

Language:CudaLicense:MITStargazers:1529Issues:24Issues:26

CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Language:CudaLicense:GPL-3.0Stargazers:1255Issues:14Issues:6

ut

C++20 μ(micro)/Unit Testing Framework

Language:C++License:BSL-1.0Stargazers:1252Issues:29Issues:155

gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.

Language:C++License:NOASSERTIONStargazers:1101Issues:46Issues:169

random

Random for modern C++ with convenient API

Language:C++License:MITStargazers:905Issues:33Issues:16

Awesome-LLM-RAG-Application

the resources about the application based on LLM with RAG pattern

clang-tutor

A collection of out-of-tree Clang plugins for teaching and learning

Language:C++License:UnlicenseStargazers:695Issues:20Issues:17

Learn-LLVM-12

Learn LLVM 12, published by Packt

Language:C++License:MITStargazers:474Issues:12Issues:17

qwen-vllm

通义千问VLLM推理部署DEMO

CUDA_gemm

A simple high performance CUDA GEMM implementation.

nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.

Language:C++License:Apache-2.0Stargazers:300Issues:13Issues:16

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++License:Apache-2.0Stargazers:280Issues:8Issues:11

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaLicense:MITStargazers:278Issues:4Issues:12

RAG_langchain

一个基于langchain实现RAG的简单示例

Language:Jupyter NotebookStargazers:266Issues:2Issues:1

gpu-benches

collection of benchmarks to measure basic GPU capabilities

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:256Issues:8Issues:11

portBLAS

An implementation of BLAS using the SYCL open standard.

Language:C++License:Apache-2.0Stargazers:254Issues:24Issues:47

CppProjectTemplate

C++ project template with unit-tests, documentation, ci-testing and workflows.

Language:CMakeLicense:MITStargazers:217Issues:15Issues:4

llvm-tutorial

llvm-tutorial文档,翻译以及代码仓库

Language:C++License:Apache-2.0Stargazers:155Issues:6Issues:1

TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Language:C++License:MITStargazers:139Issues:3Issues:58

wmma_tensorcore_sample

Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)

Language:CudaLicense:MITStargazers:110Issues:4Issues:2
Language:CudaLicense:GPL-3.0Stargazers:92Issues:2Issues:3

wmma_extension

An extension library of WMMA API (Tensor Core API)

Language:CudaLicense:MITStargazers:81Issues:8Issues:4
License:Apache-2.0Stargazers:78Issues:2Issues:0

online-softmax

Benchmark code for the "Online normalizer calculation for softmax" paper

Language:CudaLicense:BSD-3-ClauseStargazers:55Issues:6Issues:0