wong_hs's repositories
riscv-iommu
IOMMU IP compliant with the RISC-V IOMMU Specification v1.0
mgpusim
A highly-flexible GPU simulator for AMD GPUs.
mlc-llm
Universal LLM Deployment Engine with ML Compilation
vulkan-sim
Vulkan-Sim is a GPU architecture simulator for Vulkan ray tracing based on GPGPU-Sim and Mesa.
ramulator2
Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM standards, emerging RowHammer mitigation techniques). Described in our paper https://people.inf.ethz.ch/omutlu/pub/Ramulator2_arxiv23.pdf
tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
ROCm
AMD ROCm™ Software - GitHub Home
brpc
brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" means "better RPC".
iob-cache
Verilog Configurable Cache
FasterTransformer
Transformer related optimization, including BERT, GPT
esp
Embedded Scalable Platforms: Heterogeneous SoC architecture and IP integration made easy
MegCC
MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器
triton
Development repository for the Triton language and compiler
iDMA
A modular, parametrizable, and highly flexible Data Movement Accelerator (DMA)
opentitan
OpenTitan: Open source silicon root of trust
Coyote
Framework providing operating system abstractions and a range of shared networking (RDMA, TCP/IP) and memory services to common modern heterogeneous platforms.
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
gloo
Collective communications library with various primitives for multi-machine training.
start-ai-compiler
Start AI Compiler
lbt
Develop toolchain based on llvm to for Cpu0 processor
openmlsys-zh
《Machine Learning Systems: Design and Implementation》- Chinese Version
ML-Accelerators
Topics in Machine Learning Accelerator Design
DeepLearningSystem
Deep Learning System core principles introduction.
NOCulator
NOCulator is a network-on-chip simulator providing cycle-accurate performance models for a wide variety of networks (mesh, torus, ring, hierarchical ring, flattened butterfly) and routers (buffered, bufferless, Adaptive Flow Control, minBD, HiRD).
Ripes
A graphical processor simulator and assembly editor for the RISC-V ISA
CUDA-Programming-Guide-in-Chinese
This is a Chinese translation of the CUDA programming guide
book
backup some books
AI-Chip
A list of ICs and IPs for AI, Machine Learning and Deep Learning.