LuckerYi's starred repositories

Interpret_Instruction_Tuning_LLMs

Understanding Why and How Instruction Tuning Changes Pre-trained Models

Language:PythonLicense:GPL-3.0Stargazers:13Issues:0Issues:0

videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Language:PythonLicense:Apache-2.0Stargazers:153Issues:0Issues:0

LibriTTS-P

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

Stargazers:104Issues:0Issues:0

emphassess

This repository presents an evaluation framework for speech-to-speech (S2S) models, following the methodology described in the EmphAsses paper (de Seyssel et al., 2023).

Language:PythonLicense:NOASSERTIONStargazers:11Issues:0Issues:0

Token-level-Direct-Preference-Optimization

Reference implementation for Token-level Direct Preference Optimization(TDPO)

Language:PythonLicense:Apache-2.0Stargazers:83Issues:0Issues:0

Awesome-Diffusion-Models

A collection of resources and papers on Diffusion Models

Language:HTMLLicense:MITStargazers:10645Issues:0Issues:0

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonLicense:Apache-2.0Stargazers:24762Issues:0Issues:0

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5889Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:31368Issues:0Issues:0

best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

Language:PythonLicense:MITStargazers:73Issues:0Issues:0
Language:PythonStargazers:123Issues:0Issues:0

CLAP

Learning audio concepts from natural language supervision

Language:PythonLicense:MITStargazers:448Issues:0Issues:0

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Language:PythonStargazers:556Issues:0Issues:0

PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Language:PythonLicense:MITStargazers:206Issues:0Issues:0

tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:12733Issues:0Issues:0

torchscale

Foundation Architecture for (M)LLMs

Language:PythonLicense:MITStargazers:2990Issues:0Issues:0

normalizing-flows

PyTorch implementation of normalizing flow models

Language:PythonLicense:MITStargazers:658Issues:0Issues:0

MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Language:PythonLicense:MITStargazers:372Issues:0Issues:0

normalizing_flows

Pytorch implementations of density estimation algorithms: BNAF, Glow, MAF, RealNVP, planar flows

Language:PythonStargazers:598Issues:0Issues:0

glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Language:PythonLicense:MITStargazers:655Issues:0Issues:0

glow

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"

Language:PythonLicense:MITStargazers:3104Issues:0Issues:0
Language:PythonLicense:MITStargazers:391Issues:0Issues:0

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:8438Issues:0Issues:0

so-vits-svc

SoftVC VITS Singing Voice Conversion

Language:PythonLicense:AGPL-3.0Stargazers:25150Issues:0Issues:0

torchdiffeq

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

Language:PythonLicense:MITStargazers:5431Issues:0Issues:0

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

License:MITStargazers:548Issues:0Issues:0

VISinger2

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

Language:PythonStargazers:306Issues:0Issues:0

ControlNet

Let us control diffusion models!

Language:PythonLicense:Apache-2.0Stargazers:29493Issues:0Issues:0

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonLicense:AGPL-3.0Stargazers:138316Issues:0Issues:0

HTS-Audio-Transformer

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Language:PythonLicense:MITStargazers:340Issues:0Issues:0