NVIDIA Corporation

NVIDIA Corporation's repositories

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION14129 171 1173

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

Language:C++Apache-2.012065 120 2951

cuda-python

CUDA Python: Performance meets Productivity

Language:PythonNOASSERTION3023 46 599

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.02891 36 556

nv-ingest

NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.

Language:PythonApache-2.02760 28 183

gpu-operator

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes

Language:GoApache-2.02387 50 984

stdexec

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.

Language:C++Apache-2.02074 58 606

cccl

CUDA Core Compute Libraries

Language:C++NOASSERTION2016 32 2445

aistore

AIStore: scalable storage for AI applications

Language:GoMIT1622 51 111

TensorRT-Model-Optimizer

A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.

Language:PythonApache-2.01519 24 237