mananshah99

followers

following

stars

@kumo-ai

San Francisco, CA

Manan Shah's starred repositories

imessage-exporter

Export iMessage data + run iMessage Diagnostics

Language:RustGPL-3.0269000

card-web

The web app behind thecompendium.cards

Language:TypeScriptApache-2.04600

py-spy

Sampling profiler for Python programs

Language:RustMIT1223100

diagrams

:art: Diagram as Code for prototyping cloud system architectures

Language:PythonMIT3572500

libbacktrace

A C library that may be linked into a C/C++ program to produce symbolic backtraces

Language:CNOASSERTION92300

anynp

Proof-of-concept of global switching between numpy/jax/pytorch in a library.

Language:Python1600

unitycatalog

Open, Multi-modal Catalog for Data & AI

Language:JavaApache-2.0200100

kubectx

Faster way to switch between clusters and namespaces in kubectl

Language:GoApache-2.01731000

kubernetes

Production-Grade Container Scheduling and Management

Language:GoApache-2.010872000

color

Language:Go900

engineering-blogs

A curated list of engineering blogs

Language:Ruby3038800

How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

Language:CudaApache-2.076600

How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

Apache-2.0100

llama_duo

asynchronous/distributed speculative evaluation for llama3

Language:C++MIT3300

go

The Go programming language

Language:GoBSD-3-Clause12152200

ucall

Web Serving and Remote Procedure Calls at 50x lower latency and 70x higher bandwidth than FastAPI, implementing JSON-RPC & REST over io_uring ☎️

Language:CApache-2.0109900

liburing

Library providing helpers for the Linux kernel io_uring support

Language:CMIT271500

pt-three-ways

Path tracing, done three ways

Language:C++MIT19100

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilog673500

attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Language:PythonMIT42500

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

Language:C++Apache-2.0581600

whispercpp

Pybind11 bindings for Whisper.cpp

Language:C++Apache-2.03100

bark.cpp

Suno AI's Bark model in C/C++ for fast text-to-speech

Language:C++MIT64200

shouldersOfGiants.rs

I have no idea what I'm doing , but llm.c in rust

Language:PythonApache-2.01000

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT2224800

ShallowSpeed

Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

Language:Python7100

baidu-allreduce

Language:CudaApache-2.055700

PytorchBridge

Designing bridge trusses with Pytorch autograd

Language:Jupyter Notebook6100

simdjson

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

Language:C++Apache-2.01886300

awesome-distributed-system-projects

🚀 List of distributed system projects for inspiration and learning to build distributed services from real world examples