AmrMKayid

Amr Kayid's repositories

awesome-grad-school

🎓 Advice and resources for thriving and surviving graduate school

MIT100

av-benchmark

Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs, LAION-CLAP, MS-CLAP, DeSync

MIT000

awesome-digital-twins

Curated repository of awesome Digital Twin resources

BSD-3-Clause000

cactus

Cross-platform framework for deploying LLM/VLM/TTS models locally on smartphones.

Apache-2.0000

chatterbox

SoTA open-source TTS

Language:PythonMIT000

checkpoint-engine

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

MIT000

CLAP

Contrastive Language-Audio Pretraining

Language:PythonCC0-1.0000

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Apache-2.0000

distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

Language:PythonMIT000

embedding-atlas

Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.

MIT000

FastVideo

FastVideo is an open-source framework for accelerating large video diffusion model.

Apache-2.0000

Genesis

A generative world for general-purpose robotics & embodied AI learning.

Apache-2.0000

LLaDA

Official PyTorch implementation for "Large Language Diffusion Models"

MIT000

MMAudio

[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

MIT000

MS-CLAP

Learning audio concepts from natural language supervision

Language:PythonMIT000

pusa-vidgen

Pusa: Thousands Timesteps Video Diffusion Model

Apache-2.0000

Qwen-Image

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Apache-2.0000

robotic-artist

Neural Style Transfer Research

Language:Jupyter NotebookMIT030

synchformer

synchformer as a package

Language:PythonMIT000

TouchNet

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp/pp.

Apache-2.0000

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT000

AmrMKayid

Amr Kayid's repositories

awesome-grad-school

audio_flamingo

av-benchmark

awesome-digital-twins

cactus

chatterbox

checkpoint-engine

CLAP

CosyVoice

distributed-training-guide

dotdot

embedding-atlas

FastVideo

Genesis

LLaDA

marvis-tts

MMAudio

MS-CLAP

pusa-vidgen

pyocr

Qwen-Image

robotic-artist

sota-music-tagging-models

synchformer

TouchNet

VAR

vllm

Wan2.1

WhisperLiveKit

WonderWorld