lcskrishna

Chaitanya Sri Krishna Lolla's starred repositories

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.046509 307 659

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonMIT36060 367 312

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.025927 222 4267

awesome-robotic-tooling

Tooling for professional robotic development in C++ and Python with a touch of ROS, autonomous driving and aerospace.

CC0-1.03185 113 4

pytorchviz

A small package to create visualizations of PyTorch execution graphs

Language:Jupyter NotebookMIT3159 31 63

fairscale

PyTorch extensions for high performance and large scale training.

Language:PythonNOASSERTION3133 45 359

ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

Language:C++MIT2794 232 1087

flops-counter.pytorch

Flops counter for convolutional networks in pytorch framework

Language:PythonMIT2766 15 96

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language:PythonApache-2.02134 35 197

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.01780 34 305

intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Language:PythonApache-2.01540 36 517

open-gpu-doc

Documentation of NVIDIA chip/hardware interfaces

Language:CMIT1236 970

brevitas

Brevitas: neural network quantization in PyTorch

Language:PythonNOASSERTION1141 34 430

DeepBench

Benchmarking Deep Learning operations on different hardware

Language:C++Apache-2.01065 110 71

DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.

Language:PythonCC-BY-4.01054 49 146

resource-stream

CUDA related news and material links

MIT1052 37 2

pytorch_memlab

Profiling and inspecting memory in pytorch

Language:PythonMIT1011 13 35

gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

Language:C++MIT845 55 183

kineto

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.

Language:HTMLNOASSERTION684 29 211

oneMKL

oneAPI Math Kernel Library (oneMKL) Interfaces

Language:C++Apache-2.0605 47 177

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Language:CudaBSD-3-Clause521 27 10

PyProf

A GPU performance profiling tool for PyTorch models

Language:PythonApache-2.0490 200

cuda-quantum

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows

Language:C++NOASSERTION468 22 652

ort

Accelerate PyTorch models with ONNX Runtime

Language:PythonMIT353 24 37

contiguous_pytorch_params

Accelerate training by storing parameters in one contiguous chunk of memory.

Language:Python291 6 7

NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

Language:CApache-2.0275 11 35

Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

Language:C++NOASSERTION245 17 610

varuna

Language:Python230 8 11

pytorch-docker-armv7

pytorch for RaspberryPi

Language:DockerfileGPL-3.04 30