Chaitanya Sri Krishna Lolla (lcskrishna)

lcskrishna

Geek Repo

Company:AMD

Location:Santa Clara, California

Home Page:https://lcskrishna.github.io/

Twitter:@chaitan15444482

Github PK Tool:Github PK Tool

Chaitanya Sri Krishna Lolla's starred repositories

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:45797Issues:303Issues:658

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonLicense:MITStargazers:35222Issues:357Issues:306

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:23848Issues:221Issues:3664

awesome-robotic-tooling

Tooling for professional robotic development in C++ and Python with a touch of ROS, autonomous driving and aerospace.

pytorchviz

A small package to create visualizations of PyTorch execution graphs

Language:Jupyter NotebookLicense:MITStargazers:3117Issues:31Issues:63

fairscale

PyTorch extensions for high performance and large scale training.

Language:PythonLicense:NOASSERTIONStargazers:3087Issues:45Issues:358

ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

flops-counter.pytorch

Flops counter for convolutional networks in pytorch framework

Language:PythonLicense:MITStargazers:2750Issues:16Issues:94

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language:PythonLicense:Apache-2.0Stargazers:2102Issues:36Issues:192
Language:PythonLicense:NOASSERTIONStargazers:2022Issues:84Issues:18

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1687Issues:37Issues:275

intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Language:PythonLicense:Apache-2.0Stargazers:1492Issues:35Issues:503

open-gpu-doc

Documentation of NVIDIA chip/hardware interfaces

Language:CLicense:MITStargazers:1226Issues:98Issues:0

brevitas

Brevitas: neural network quantization in PyTorch

Language:PythonLicense:NOASSERTIONStargazers:1135Issues:35Issues:425

DeepBench

Benchmarking Deep Learning operations on different hardware

Language:C++License:Apache-2.0Stargazers:1062Issues:110Issues:71

DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.

Language:PythonLicense:CC-BY-4.0Stargazers:1034Issues:49Issues:145

pytorch_memlab

Profiling and inspecting memory in pytorch

Language:PythonLicense:MITStargazers:1002Issues:13Issues:34

gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

Language:C++License:MITStargazers:824Issues:56Issues:180

kineto

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.

Language:HTMLLicense:NOASSERTIONStargazers:664Issues:28Issues:208

oneMKL

oneAPI Math Kernel Library (oneMKL) Interfaces

Language:C++License:Apache-2.0Stargazers:596Issues:48Issues:170

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Language:CudaLicense:BSD-3-ClauseStargazers:492Issues:28Issues:10

PyProf

A GPU performance profiling tool for PyTorch models

Language:PythonLicense:Apache-2.0Stargazers:490Issues:20Issues:0

cuda-quantum

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows

Language:C++License:NOASSERTIONStargazers:441Issues:21Issues:608

ort

Accelerate PyTorch models with ONNX Runtime

Language:PythonLicense:MITStargazers:350Issues:23Issues:37

contiguous_pytorch_params

Accelerate training by storing parameters in one contiguous chunk of memory.

NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

Language:CLicense:Apache-2.0Stargazers:262Issues:12Issues:33

Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

Language:C++License:NOASSERTIONStargazers:239Issues:19Issues:582
Language:PythonLicense:Apache-2.0Stargazers:27Issues:8Issues:8

pytorch-docker-armv7

pytorch for RaspberryPi

Language:DockerfileLicense:GPL-3.0Stargazers:4Issues:3Issues:0