Sakib Ahamed's repositories
playground-v2-1024px-aesthetic
Playground v2 is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at Playground.
AICoverGen
A WebUI to create song covers with any RVC v2 trained AI voice from YouTube videos or audio files.
voice-cloning-create-dataset
Create your own RVC v2 dataset from a youtube video
voice-cloning-training
Voice data <= 10 mins can also be used to train a good VC model!
cog-parakeet-rnnt-1.1b
nvidia/parakeet-rnnt-1.1b running in Replicate Cog container ⚙️
cog-uform-gen
Cog wrapper for unum-cloud/uform-gen (Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️)
TTDS-G35-CW3
TTDS Group Project: Video Games Search Engine. Sakib Ahamed. Dan Buxton, Kenza Amira, Wini Lau, Mansoor Ahmad
cog-aya-101
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
frame-interpolation
FILM: Frame Interpolation for Large Motion, In ECCV 2022.
PatchFusion
An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
conda-envs-in-cog
How to use Conda with Replicate Cog to easily manage packages in your projects. Step-by-step examples included!
GeneFacePlusPlus
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
TalkNet-ASD
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
trocr-base-handwritten
🖋️➡️📱Converts handwritten text images into digital text
voice-cloning
voice-to-voice generation (change your voice)
AnimateDiff
Official implementation of AnimateDiff.
animatediff-cli-prompt-travel
animatediff prompt travel
Kandinsky-2
Kandinsky 2 — multilingual text2image latent diffusion model
Moore-AnimateAnyone
Unofficial Re-Trained AnimateAnyone (Image + DWPose Video → Animated Video of Image)
Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
YOLO-World
Real-Time Open-Vocabulary Object Detection