i-MaTh's repositories
Algorithm
记录一些常用算法的实现(涵盖常用的数据结构,机器学习以及语音识别中常用算法)
Applio
A simple, high-quality voice conversion tool focused on ease of use and performance.
audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
city_json
**城市json&港澳台、世界城市json
cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
cs-self-learning
计算机自学指南
CVQ-VAE
[ICCV 2023] Online Clustered Codebook
dclm
DataComp for Language Models
flux
Official inference repo for FLUX.1 models
friendly-stable-audio-tools
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
HiFTNet
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
lingua
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
mean-opinion-score
Python library for calculating the mean opinion score and 95% confidence interval of the standard deviation of text-to-speech ratings according to Ribeiro et al. (2011).
mini-omni
open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
NCE
Yingshi New Concept English
openai-python
The official Python library for the OpenAI API
OpenDiloco
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
Parrot-TTS
Official Code for ParrotTTS
PerceptiveAgent
Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
RAVE
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
speech-resynthesis
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
voice-chat-pdf
Use OpenAI's realtime API for a chatting with your documents
Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020.
WavChat
A Survey of Spoken Dialogue Models (60 pages)