Amantur Amatov (amanteur)

amanteur

Geek Repo

Company:Kits.AI

Location:Bishkek, Kyrgyzstan

Github PK Tool:Github PK Tool

Amantur Amatov's starred repositories

muchomusic

MuChoMusic is a benchmark for evaluating music understanding in multimodal audio-language models.

Language:Jupyter NotebookLicense:MITStargazers:16Issues:0Issues:0

Speech-Editing-Toolkit

It's a repository for implementations of neural speech editing algorithms.

Language:PythonStargazers:183Issues:0Issues:0

FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

Language:PythonLicense:NOASSERTIONStargazers:1324Issues:0Issues:0

SEMamba

This is the official implementation of the SEMamba paper. (Accepted to IEEE SLT 2024)

Language:PythonStargazers:116Issues:0Issues:0

matchering

🎚️ Open Source Audio Matching and Mastering

Language:PythonLicense:GPL-3.0Stargazers:1302Issues:0Issues:0

audio-flamingo

PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.

Language:PythonLicense:MITStargazers:167Issues:0Issues:0

ruff

An extremely fast Python linter and code formatter, written in Rust.

Language:RustLicense:MITStargazers:30855Issues:0Issues:0

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:11482Issues:0Issues:0

VISinger2

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

Language:PythonStargazers:309Issues:0Issues:0

seed-vc

zero-shot voice conversion with in context learning

Language:PythonLicense:MITStargazers:80Issues:0Issues:0

Fast-GeCo

Source code and demo for INTERSPEECH 2024 paper: Noise-robust Speech Separation with Fast Generative Correction

Language:PythonStargazers:25Issues:0Issues:0

mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Language:PythonLicense:MITStargazers:2058Issues:0Issues:0

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Language:PythonLicense:MITStargazers:4700Issues:0Issues:0

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Language:PythonLicense:MITStargazers:4373Issues:0Issues:0

HiFTNet-sr

HiFTNet wav/audio super-resolution 16/24 kHz to 48 kHz

Language:PythonLicense:MITStargazers:21Issues:0Issues:0

Stable-Hybrid-Auditory-Filterbanks

Official Implementation of Interspeech 2024 Paper "Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement"

Language:PythonLicense:BSD-3-Clause-ClearStargazers:20Issues:0Issues:0
Language:PythonStargazers:5Issues:0Issues:0

stable-audio-controlnet

Fine-tune Stable Audio Open with DiT ControlNet.

Language:PythonLicense:NOASSERTIONStargazers:149Issues:0Issues:0

voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

Stargazers:1666Issues:0Issues:0

WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Language:PythonLicense:MITStargazers:578Issues:0Issues:0

BSSE-SE

Boosting Self-Supervised Embeddings for Speech Enhancement

Language:PythonLicense:MITStargazers:42Issues:0Issues:0

AFX-Research

Scientific literature about Audio Effects

Language:HTMLStargazers:111Issues:0Issues:0

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonLicense:MITStargazers:1164Issues:0Issues:0

Respiro-en

Official implementation of paper: Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech

Language:PythonLicense:MITStargazers:17Issues:0Issues:0
Language:PythonStargazers:26Issues:0Issues:0

PeriodWave

The official Implementation of PeriodWave and PeriodWave-Turbo

License:MITStargazers:105Issues:0Issues:0

nnsvs

Neural network-based singing voice synthesis library for research

Language:PythonLicense:MITStargazers:680Issues:0Issues:0

promonet

Prosody and Pronunciation Modification Network

Language:PythonLicense:MITStargazers:35Issues:0Issues:0

edm

Elucidating the Design Space of Diffusion-Based Generative Models (EDM)

Language:PythonLicense:NOASSERTIONStargazers:1281Issues:0Issues:0

music2latent

Encode and decode audio samples to/from compressed latent representations!

Language:PythonLicense:NOASSERTIONStargazers:113Issues:0Issues:0