sara4dev

followers

following

stars

nvidia

Dallas, TX

Organizations

meygam

Saravana Periyasamy's starred repositories

langchain

🦜🔗 Build context-aware reasoning applications

Language:Jupyter NotebookMIT90804 679 7409

papers-we-love

Papers from the computer science community to read and discuss.

Language:Shell86121 3110 236

llama.cpp

LLM inference in C/C++

Language:C++MIT63654 530 3590

autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

Language:Jupyter NotebookCC-BY-4.029714 361 1560

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.024924 219 4003

skywalking

APM, Application Performance Monitoring System

Language:JavaApache-2.023612 837 5280

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT22669 222 129

qdrant

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Language:RustApache-2.019481 119 1181

candle

Minimalist ML framework for Rust

Language:RustApache-2.014881 147 656

triton

Development repository for the Triton language and compiler

Language:C++MIT12257 183 1362

ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

Language:PythonApache-2.011050 193 1062

mise

dev tools, env vars, task runner

Language:RustMIT8716 27 1014

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.07935 85 1715

server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Language:PythonBSD-3-Clause7900 141 3661

nvtop

GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm

Language:CNOASSERTION7877 77 237

rancher-desktop

Container Management and Kubernetes on the Desktop

Language:TypeScriptApache-2.05775 53 3607

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++NOASSERTION5112 106 985

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.03894 32 1265

digger

Digger is an open source IaC orchestration tool. Digger allows you to run IaC in your existing CI pipeline ⚡️

Language:GoApache-2.02835 19 421

bpftop

bpftop provides a dynamic real-time view of running eBPF programs. It displays the average runtime, events per second, and estimated total CPU % for each program.

Language:CApache-2.02149 152 18

lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Language:PythonApache-2.02011 33 228

axlearn

An Extensible Deep Learning Library

Language:PythonApache-2.01722 61 9

inference

Reference implementations of MLPerf™ inference benchmarks

Language:PythonApache-2.01165 59 783

LearnRust

Rust Learning Resources

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.0988 14 88

optimum-nvidia

Language:PythonApache-2.0856 40 63

llmperf

LLMPerf is a library for validating and benchmarking LLMs

Language:PythonApache-2.0524 9 23

nxs-universal-chart

The Helm chart you can use to install any of your applications into Kubernetes/OpenShift

Language:SmartyApache-2.0365 17 27

rules_oci

Bazel rules for building OCI containers

Language:StarlarkApache-2.0269 10 269

k8s-dra-driver

Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes

Language:GoApache-2.0210 15 39