Amr Kayid's repositories
awesome-grad-school
🎓 Advice and resources for thriving and surviving graduate school
av-benchmark
Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs, LAION-CLAP, MS-CLAP, DeSync
awesome-digital-twins
Curated repository of awesome Digital Twin resources
cactus
Cross-platform framework for deploying LLM/VLM/TTS models locally on smartphones.
chatterbox
SoTA open-source TTS
checkpoint-engine
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
CLAP
Contrastive Language-Audio Pretraining
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
embedding-atlas
Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.
FastVideo
FastVideo is an open-source framework for accelerating large video diffusion model.
Genesis
A generative world for general-purpose robotics & embodied AI learning.
LLaDA
Official PyTorch implementation for "Large Language Diffusion Models"
MMAudio
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MS-CLAP
Learning audio concepts from natural language supervision
pusa-vidgen
Pusa: Thousands Timesteps Video Diffusion Model
Qwen-Image
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
robotic-artist
Neural Style Transfer Research
synchformer
synchformer as a package
TouchNet
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp/pp.
VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Wan2.1
Wan: Open and Advanced Large-Scale Video Generative Models
WhisperLiveKit
Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI.
WonderWorld
Code release for https://kovenyu.com/WonderWorld/