i-MaTh

i-MaTh's repositories

Algorithm

记录一些常用算法的实现（涵盖常用的数据结构，机器学习以及语音识别中常用算法）

Language:Jupyter Notebook000

Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

Language:PythonMIT000

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Language:PythonMIT000

city_json

**城市json&港澳台、世界城市json

MIT000

cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Language:PythonNOASSERTION000

cs-self-learning

计算机自学指南

NOASSERTION000

CVQ-VAE

[ICCV 2023] Online Clustered Codebook

Language:PythonMIT000

dclm

DataComp for Language Models

Language:HTMLMIT000

flux

Official inference repo for FLUX.1 models

Language:PythonApache-2.0000

friendly-stable-audio-tools

Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.

MIT000

HiFTNet

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

MIT000

lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

BSD-3-Clause000

MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

AGPL-3.0000

mean-opinion-score

Python library for calculating the mean opinion score and 95% confidence interval of the standard deviation of text-to-speech ratings according to Ribeiro et al. (2011).

Language:PythonMIT000

mini-omni

open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

MIT000

Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi

Language:PythonMIT000

NCE

Yingshi New Concept English

MIT000

openai-python

The official Python library for the OpenAI API

Apache-2.0000

OpenDiloco

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

Language:PythonApache-2.0000

Parrot-TTS

Official Code for ParrotTTS

000

PerceptiveAgent

Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))

Apache-2.0000

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

MIT000

RAVE

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

NOASSERTION000

shell_tools

020

speech-resynthesis

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

NOASSERTION000

spiritlm

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

NOASSERTION000

vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Language:PythonMIT000

voice-chat-pdf

Use OpenAI's realtime API for a chatting with your documents

Language:JavaScriptMIT000

Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020.

Language:Python000

WavChat

A Survey of Spoken Dialogue Models (60 pages)

000