pengwubj's repositories

catapult

Catapult

Language:HTMLLicense:BSD-3-ClauseStargazers:1Issues:2Issues:0

.tmux

🇫🇷 Oh My Tmux! My pretty + versatile tmux configuration that just works (imho the best tmux configuration)

License:MITStargazers:0Issues:0Issues:0

CuAssembler

An unofficial cuda assembler, for all generations of SASS, hopefully :)

License:MITStargazers:0Issues:0Issues:0

CUDA-Learn-Notes

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

License:GPL-3.0Stargazers:0Issues:0Issues:0

CUDAsmith

A CUDA compiler fuzzer

Stargazers:0Issues:0Issues:0

deepfloat

An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

Language:SystemVerilogLicense:NOASSERTIONStargazers:0Issues:2Issues:0

DeepLearningSystem

Deep Learning System core principles introduction.

License:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

e200_opensource

The Ultra-Low Power RISC Core

Language:VerilogLicense:Apache-2.0Stargazers:0Issues:2Issues:0

flash-attention

Fast and memory-efficient exact attention

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

Fractional-GPUs

Splits single Nvidia GPU into multiple partitions with complete compute and memory isolation (wrt to performace) between the partitions

Stargazers:0Issues:0Issues:0

gpu-benches

collection of benchmarks to measure basic GPU capabilities

License:GPL-3.0Stargazers:0Issues:0Issues:0

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Stargazers:0Issues:0Issues:0

leetcode

LeetCode Problems' Solutions

Language:C++Stargazers:0Issues:0Issues:0

models

Pre-trained and Reproduced Deep Learning Models (经典复现模型)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

NBAssembler

Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
License:NOASSERTIONStargazers:0Issues:0Issues:0

netron

Visualizer for deep learning and machine learning models

Language:JavaScriptLicense:MITStargazers:0Issues:0Issues:0

nsight-training

Training material for Nsight developer tools

License:NOASSERTIONStargazers:0Issues:0Issues:0
License:NOASSERTIONStargazers:0Issues:0Issues:0

one-key-hidpi

Enable macOS HiDPI and have a native setting.

Language:ShellStargazers:0Issues:0Issues:0

open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

License:NOASSERTIONStargazers:0Issues:0Issues:0

Project-Zipline

Defines a lossless compressed data format that is independent of CPU type, operating system, file system, and character set, and is suitable for compression using the XP10 algorithm.

Language:VerilogLicense:NOASSERTIONStargazers:0Issues:1Issues:0

riscv-profiles

RISC-V Architecture Profiles

License:CC-BY-4.0Stargazers:0Issues:0Issues:0

riscv-soc-book

关于RISC-V你所需要知道的一切

Stargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

spf13-vim

The ultimate vim distribution

Language:Vim scriptLicense:Apache-2.0Stargazers:0Issues:0Issues:0

swerv_eh1

A directory of Western Digital’s RISC-V SweRV Cores

Language:SystemVerilogLicense:Apache-2.0Stargazers:0Issues:2Issues:0

tensor-cores-numerical-behavior

Test suite for probing the numerical behavior of NVIDIA tensor cores

License:GPL-2.0Stargazers:0Issues:0Issues:0

transformers-benchmarks

real Transformer TeraFLOPS on various GPUs

License:Apache-2.0Stargazers:0Issues:0Issues:0