Syed Tousif Ahmed's repositories
benchmark-rngs
C++ benchmark for RNG headers
fomu-workshop
Support files for participating in a Fomu workshop
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
cpplinks
A categorized list of C++ resources.
darknet
Convolutional Neural Networks
deepfloat
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
hydra
Hydra is a framework for elegantly configuring complex applications
kv260-vitis
Kria KV260 Vitis platforms and overlays
Light-HLS
Fast, Accurate and Convenient Light-Weight HLS Framework for Academic Exploration and Evaluation.
nccl
Optimized primitives for collective multi-GPU communication
neat-matrix-library
nml is a "simple" matrix/numerical analysis library written in pure C. The scope of the library is to highlight various algorithm implementations related to matrices. Code readability was a major concern.
scs
Splitting Conic Solver
symbiflow-arch-defs
FOSS architecture definitions of FPGA hardware useful for doing PnR device generation.
TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
torchrec
Pytorch domain library for recommendation systems
vivado-hacks
Miscellaneous hacks surrounding Xilinx's vivado backend
vtr-verilog-to-routing
Verilog to Routing -- Open Source CAD Flow for FPGA Research