mit10000's repositories
ai_and_memory_wall
AI and Memory Wall blog post
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
awesome-real-time-AI
This is a list of awesome edgeAI inference related papers.
bert4torch
An elegent pytorch implement of transformers
BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
calm
C(UDA) accelerated language model inference
Computer-Science-Textbooks
Collect some CS textbooks for learning.
cuda_learning
learning how CUDA works
DeepLearningSystem
Deep Learning System core principles introduction.
gemmini
Berkeley's Spatial Array Generator
gpu-benches
collection of benchmarks to measure basic GPU capabilities
llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
llm_profiler
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
llmperf
LLMPerf is a library for validating and benchmarking LLMs
mixbench
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
PatchTST
An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730
pdfs
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
pytorch-benchmark
Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption
scale-sim-v2
Repository to host and maintain scale-sim-v2 code
sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
wanda
A simple and effective LLM pruning approach.
zigzag
HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators