Takaaki-Saeki

followers

following

stars

Google

Tokyo, Japan

https://takaaki-saeki.github.io/

Takaaki Saeki's starred repositories

pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper

Language:PythonMIT20000

TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Language:PythonApache-2.016200

PAM

PAM is a no-reference audio quality metric for audio generation tasks

Language:PythonMIT3600

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Language:Python973100

pflow-encodec

Implementation of TTS model based on NVIDIA P-Flow TTS Paper

Language:Python6400

DDDM-VC

Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)

Language:Python15700

Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Language:Python19000

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonApache-2.01115400

SpeechGPT

SpeechGPT Series: Speech Large Language Models

Language:PythonApache-2.0113500

mamba

Mamba SSM architecture

Language:PythonApache-2.01201600

DiscreteSpeechMetrics

Reference-aware automatic speech evaluation toolkit

Language:PythonMIT8200

self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

Language:PythonMIT127600

ssl_speech_restoration_v2

Language:PythonMIT1000

AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Language:Python54500

ai-audio-startups

Community list of startups working with AI in audio and music technology

Apache-2.0150400

PLMpapers

Must-read Papers on pre-trained language models.

MIT331200

Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

CC0-1.01653900

voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Language:PythonMIT56900

contentvec

speech self-supervised representations

Language:PythonMIT43900

CML-TTS-Dataset

CML-TTS: A Multilingual Dataset for Speech Synthesis

Language:HTML2800

uroman-python

Python wrapper around uroman tokenizer

Language:Nix1200

miipher

Unofficial implementation of miipher

Language:PythonMIT9700

vits2_pytorch

unofficial vits2-TTS implementation in pytorch

Language:PythonMIT47000

SpeechMOS

Easy-to-Use Speech MOS predictors

Language:PythonMIT19000

codellama

Inference code for CodeLlama models

Language:PythonNOASSERTION1572600

randomized_positional_encodings

Randomized Positional Encodings Boost Length Generalization of Transformers

Language:PythonApache-2.07500

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonMIT105700

Speech-Prompts-Adapters

This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.

vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Language:PythonMIT226000

zm-text-tts

[IJCAI'23] Learning to Speak from Text for Low-Resource TTS

Language:PythonApache-2.06300