XXZH's starred repositories

llama.cpp

LLM inference in C/C++

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Language:LLVMLicense:NOASSERTIONStargazers:27521Issues:587Issues:74749

triton

Development repository for the Triton language and compiler

FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

Language:PythonLicense:Apache-2.0Stargazers:9107Issues:110Issues:81

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:MITStargazers:4245Issues:33Issues:438

chisel

Chisel: A Modern Hardware Design Language

Language:ScalaLicense:Apache-2.0Stargazers:3876Issues:150Issues:1040

CUDALibrarySamples

CUDA Library Samples

Language:CudaLicense:NOASSERTIONStargazers:1475Issues:31Issues:182

stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Language:PythonLicense:MITStargazers:1113Issues:18Issues:117

firrtl

Flexible Intermediate Representation for RTL

Language:ScalaLicense:Apache-2.0Stargazers:716Issues:61Issues:660

Pyverilog

Python-based Hardware Design Processing Toolkit for Verilog HDL

Language:PythonLicense:Apache-2.0Stargazers:607Issues:41Issues:92

buddy-mlir

An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).

Language:C++License:Apache-2.0Stargazers:462Issues:13Issues:51

nvbench

CUDA Kernel Benchmarking Library

Language:CudaLicense:Apache-2.0Stargazers:461Issues:18Issues:91

timeloop

Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.

Language:C++License:BSD-3-ClauseStargazers:313Issues:21Issues:177

nimble

Lightweight and Parallel Deep Learning Framework

Language:C++License:NOASSERTIONStargazers:258Issues:9Issues:23

soDLA

Chisel implementation of the NVIDIA Deep Learning Accelerator (NVDLA), with self-driving accelerated

Language:VerilogLicense:NOASSERTIONStargazers:218Issues:21Issues:19

dsptools

A Library of Chisel3 Tools for Digital Signal Processing

Language:ScalaLicense:Apache-2.0Stargazers:217Issues:37Issues:75

inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

Language:C++License:MITStargazers:189Issues:8Issues:24

dora-from-scratch

LoRA and DoRA from Scratch Implementations

Language:Jupyter NotebookLicense:MITStargazers:175Issues:3Issues:1

treadle

Chisel/Firrtl execution engine

Language:ScalaLicense:Apache-2.0Stargazers:151Issues:29Issues:59

minimalloc

A lightweight memory allocator for hardware-accelerated machine learning

Language:C++License:Apache-2.0Stargazers:108Issues:6Issues:5

bit

Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer

Language:PythonLicense:NOASSERTIONStargazers:97Issues:16Issues:7

zigzag

HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators

Language:C++License:MITStargazers:96Issues:7Issues:34

cutlass_fpA_intB_gemm

A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer

Language:C++License:Apache-2.0Stargazers:81Issues:20Issues:6

accelergy-timeloop-infrastructure

Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop

Language:DockerfileLicense:MITStargazers:38Issues:6Issues:7

EventQueue

EQueue Dialect

Language:C++License:NOASSERTIONStargazers:32Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:26Issues:1Issues:1
Language:ScalaLicense:NOASSERTIONStargazers:23Issues:7Issues:10