C00reNUT's repositories
ai-audio-datasets
AI Audio Datasets (AI-ADS) π΅, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
AI-Song-Cover-RVC
All in One Version : Youtube WAV Download, Separating Vocal, Splitting Audio, Training, and Inference Using Google Colab
AICoverGen
A WebUI to create song covers with any RVC v2 trained AI voice from YouTube videos or audio files.
aria-amt
Efficient and robust implementation of seq-to-seq automatic piano transcription.
auto_dataset_tts
A simple script to prepare dataset for training with TTS Tortoise model via https://git.ecker.tech/mrq/ai-voice-cloning
clapper
Clapper.app, the video editor designed for the age of AI cinema
ComfyUI-SaveAsScript
A powerful tool that translates ComfyUI workflows into executable Python code - now as a UI button.
courses
Anthropic's educational courses
ctc-forced-aligner
Text to speech alignment using CTC forced alignment
finetune-musicgen
a notebook containing scripts, documentation, and examples for finetuning musicgen
InfiniteMusicGen
Create seamless infinite music generation leveraging MusicGen model
Mistral-7B-south-park-fanatic
training + data generation scripts necessary to train South Park fanatic AI
narrator
David Attenborough narrates your life
Pandrator
Pandrator aspires to be a user-friendly app with a graphical interface and a one-click installer that creates high-quality speech from text in multiple languages (audiobooks, speech synchronised with subtitles and more) using local models (XTTS, Silero or VoiceCraft), plus voice cloning, LLM pre-processing, RVC enhancement, and automatic evaluation
resemble-enhance
AI powered speech denoising and enhancement
speech-dataset-generator
π Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. π§π₯π Advanced audio processing.
SpeechMOS
Easy-to-Use Speech MOS predictors
stable-audio-controlnet
Fine-tune Stable Audio Open with DiT ControlNet.
StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Tile-Upscaler
Image Upscaler with Tile Controlnet Fully Integrated in Huggingface Diffusers
Train_Hifigan_XTTS
This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
xtts-finetune-tests
In this repository I will be running various experiments on finetune different parts for xtts
youtube-transcript-api
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
ziplora-pytorch
Implementation of "ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs"