xiaoyu1004's repositories
chisel-template
自建 chisel 工程模板
FPGA-DDR-SDRAM
An AXI4-based DDR1 controller to realize mass, cheap memory for FPGA. 基于FPGA的DDR1控制器,为低端FPGA嵌入式系统提供廉价、大容量的存储。
FPGA-UART
3 modules: UART receiver, UART transmitter, UART to AXI4 master. 3个模块:UART接收器、UART发送器、UART转AXI4交互式调试器
gemm-optimize
optimize gemm
gpgpu-simx
a Cycle-Approximate Simulator
how-to-optimize-gemm
RowMajor sgemm optimization
how-to-optimize-gemm-in-cpu
A gemm compute library
how_to_optimize_convolution_in_CPU
how_to_optimize_convolution_in_CPU
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
ics-pa
The wrapper repo for NJU ICS PA.
juliuscblas
a simple blas library
kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
mtensor
A C++ Cuda Tensor Lazy Computing Library
NyuziProcessor
GPGPU microprocessor architecture
ROCm-ComputeABI-Doc
ROCm - AMDGPU Compute Application Binary Interface
RV32ISC
A RISC-V RV32I ISA Single Cycle CPU
rvcc
a c programming compiler
rvemu-singlecycle
A single cycle risc-v simulator
VeriGPU
OpenSource GPU, in Verilog, loosely based on RISC-V ISA