There are 20 repositories under audio-generation topic.
:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
A timeline of the latest AI models for audio generation, starting in 2023!
Audio generation using diffusion models, in PyTorch.
TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
A family of diffusion models for text-to-audio generation.
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
AI Audio Datasets 🎵. A list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Python library for designing and training your own Diffusion Models with PyTorch.
This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.
Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)
Reading list for research topics in Sound AI
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
Trainer for audio-diffusion-pytorch
A collection of useful audio datasets and transforms for PyTorch.
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation
Site for sharing Bark voices
Text prompt steered synthetic audio generators
Site for sharing MusicGen + AudioGen Prompts and Creations
Tracking states of the arts and recent results (bibliography) on sound tasks.
Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.19180)
This is a Piece of code that fetches news using an API and Converts it into a NEWS video
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Implementation of the AudioLM model by Google in Pytorch
Genius-SaaS: An AI-powered SaaS application built with Next.js and React for personalized recommendations, dynamic content generation, and user behavior prediction. 🚀
[AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models