amanteur

followers

following

stars

Kits.AI

Bishkek, Kyrgyzstan

Amantur Amatov's starred repositories

muchomusic

MuChoMusic is a benchmark for evaluating music understanding in multimodal audio-language models.

Language:Jupyter NotebookMIT1600

Speech-Editing-Toolkit

It's a repository for implementations of neural speech editing algorithms.

Language:Python18300

FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

Language:PythonNOASSERTION132400

SEMamba

This is the official implementation of the SEMamba paper. (Accepted to IEEE SLT 2024)

Language:Python11600

matchering

🎚️ Open Source Audio Matching and Mastering

Language:PythonGPL-3.0130200

audio-flamingo

PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.

Language:PythonMIT16700

ruff

An extremely fast Python linter and code formatter, written in Rust.

Language:RustMIT3085500

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonApache-2.01148200

VISinger2

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

Language:Python30900

seed-vc

zero-shot voice conversion with in context learning

Language:PythonMIT8000

Fast-GeCo

Source code and demo for INTERSPEECH 2024 paper: Noise-robust Speech Separation with Fast Generative Correction

Language:Python2500

mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Language:PythonMIT205800

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Language:PythonMIT470000

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Language:PythonMIT437300

HiFTNet-sr

HiFTNet wav/audio super-resolution 16/24 kHz to 48 kHz

Language:PythonMIT2100

Stable-Hybrid-Auditory-Filterbanks

Official Implementation of Interspeech 2024 Paper "Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement"

Language:PythonBSD-3-Clause-Clear2000

YOLOPitch

Language:Python500

stable-audio-controlnet

Fine-tune Stable Audio Open with DiT ControlNet.

Language:PythonNOASSERTION14900

voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Language:PythonMIT57800

BSSE-SE

Boosting Self-Supervised Embeddings for Speech Enhancement

Language:PythonMIT4200

AFX-Research

Scientific literature about Audio Effects

Language:HTML11100

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonMIT116400

Respiro-en

Official implementation of paper: Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech

Language:PythonMIT1700

LAFMA

Language:Python2600

PeriodWave

The official Implementation of PeriodWave and PeriodWave-Turbo

MIT10500

nnsvs

Neural network-based singing voice synthesis library for research

Language:PythonMIT68000

promonet

Prosody and Pronunciation Modification Network

Language:PythonMIT3500

edm

Elucidating the Design Space of Diffusion-Based Generative Models (EDM)

Language:PythonNOASSERTION128100

music2latent

Encode and decode audio samples to/from compressed latent representations!

Language:PythonNOASSERTION11300