Beast code in Giters

XXZH's starred repositories

llama.cpp

LLM inference in C/C++

Language:C++MIT63688 530 3593

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Language:LLVMNOASSERTION27521 587 74749

triton

Development repository for the Triton language and compiler

Language:C++MIT12264 183 1362

FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

Language:PythonApache-2.09107 110 81

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT4245 33 438

chisel

Chisel: A Modern Hardware Design Language

Language:ScalaApache-2.03876 150 1040

CUDALibrarySamples

CUDA Library Samples

Language:CudaNOASSERTION1475 31 182

stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Language:PythonMIT1113 18 117

firrtl

Flexible Intermediate Representation for RTL

Language:ScalaApache-2.0716 61 660

Pyverilog

Python-based Hardware Design Processing Toolkit for Verilog HDL

Language:PythonApache-2.0607 41 92

buddy-mlir

An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).

Language:C++Apache-2.0462 13 51

nvbench

CUDA Kernel Benchmarking Library

Language:CudaApache-2.0461 18 91

timeloop

Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.

Language:C++BSD-3-Clause313 21 177

nimble

Lightweight and Parallel Deep Learning Framework

Language:C++NOASSERTION258 9 23

soDLA

Chisel implementation of the NVIDIA Deep Learning Accelerator (NVDLA), with self-driving accelerated

Language:VerilogNOASSERTION218 21 19

dsptools

A Library of Chisel3 Tools for Digital Signal Processing

Language:ScalaApache-2.0217 37 75

inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

Language:C++MIT189 8 24

dora-from-scratch

LoRA and DoRA from Scratch Implementations

Language:Jupyter NotebookMIT175 3 1

treadle

Chisel/Firrtl execution engine

Language:ScalaApache-2.0151 29 59

minimalloc

A lightweight memory allocator for hardware-accelerated machine learning

Language:C++Apache-2.0108 6 5

bit

Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer

Language:PythonNOASSERTION97 16 7

zigzag

HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators

Language:C++MIT96 7 34

cutlass_fpA_intB_gemm

A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer

Language:C++Apache-2.081 20 6

accelergy-timeloop-infrastructure

Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop

Language:DockerfileMIT38 6 7

EventQueue

EQueue Dialect

Language:MLIR37 7 4

Open-Source-IPs

Language:C++NOASSERTION3200

ksim

Language:C++Apache-2.026 1 1

chisel-formal

Language:ScalaNOASSERTION23 7 10

open-source-formal-verification-for-chisel

Language:Jupyter Notebook12 30

opencv-samples-perf-analysis

Language:Shell2 70