Beast code in Giters

mit10000's repositories

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0000

awesome-real-time-AI

This is a list of awesome edgeAI inference related papers.

000

BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

Language:C++Apache-2.0000

calm

C(UDA) accelerated language model inference

Language:CMIT000

Computer-Science-Textbooks

Collect some CS textbooks for learning.

000

DeepLearningSystem

Deep Learning System core principles introduction.

Apache-2.0000

gemmini

Berkeley's Spatial Array Generator

NOASSERTION000

gpu-benches

collection of benchmarks to measure basic GPU capabilities

GPL-3.0000

llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

Language:PythonApache-2.0000

LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

MIT000

llm_profiler

llm theoretical performance analysis tools and support params, flops, memory and latency analysis.

Language:Python000

llmperf

LLMPerf is a library for validating and benchmarking LLMs

Apache-2.0000

mixbench

A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)

GPL-2.0000

model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

Apache-2.0000

nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

MIT000

PatchTST

An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730

Apache-2.0000

pdfs

Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)

000

pytorch-benchmark

Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption

Apache-2.0000

scale-sim-v2

Repository to host and maintain scale-sim-v2 code

000

sparsegpt

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Apache-2.0000

Time-Series-Library

A Library for Advanced Deep Time Series Models.

MIT000

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

000

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

MIT000

wanda

A simple and effective LLM pruning approach.

MIT000

zigzag

HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators

BSD-3-Clause000

mit10000

mit10000's repositories

1667

ai_and_memory_wall

Awesome-LLM-Inference

awesome-real-time-AI

BladeDISC

calculon

calm

Computer-Science-Textbooks

cuda_learning

DeepLearningSystem

DejaVu

gemmini

gpu-benches

llm-analysis

LLM-Viewer

llm_profiler

llmperf

mixbench

model_analyzer

nnfusion

PatchTST

pdfs

pytorch-benchmark

scale-sim-v2

sparsegpt

Time-Series-Library

tiny-gpu

VAR

wanda

zigzag