Beast code in Giters

cc-cherie's starred repositories

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonApache-2.0322400

axolotl

Go ahead and axolotl questions

Language:PythonApache-2.0687900

tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Language:Jupyter NotebookApache-2.01241500

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonMIT2889100

OpenVoice

Instant voice cloning by MyShell.

Language:PythonMIT2718400

insanely-fast-whisper

Language:Jupyter NotebookApache-2.0696300

denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Language:PythonMIT750900

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT1913000

generative-models

Generative Models by Stability AI

Language:PythonMIT2327100

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Language:PythonApache-2.0193000

emotionally_consistent_speech

Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition

Language:Jupyter NotebookMIT100

benchmarks

This repository contains the SpeechBrain Benchmarks

Language:PythonApache-2.07000

audiotext-transformer

Multimodal Transformer for Korean Sentiment Analysis with Audio and Text Features

Language:Python2400

MMSA

MMSA is a unified framework for Multimodal Sentiment Analysis.

Language:PythonMIT61100

BERT-like-is-All-You-Need

The code for our INTERSPEECH 2020 paper - Jointly Fine-Tuning "BERT-like'" Self Supervised Models to Improve Multimodal Speech Emotion Recognition

Language:PythonMIT10900

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonApache-2.0663700

Speech-emotion-recognition-MCFN

This is a repository for our work: A DUAL ATTENTION-BASED MODALITY-COLLABORATIVE FUSION NETWORK FOR EMOTION RECOGNITION

Language:Python400

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Language:PythonBSD-3-Clause306100

data2vec-pytorch

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

Language:PythonMIT16500

CLAP

Contrastive Language-Audio Pretraining

Language:PythonCC0-1.0124300

s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Language:PythonApache-2.0215500

PraatScripts

These are praat scripts I use in my research, implemented in parselmouth for python for use in binder

Language:Jupyter NotebookMIT11900

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT3375700

stock

stock，股票系统。使用python进行开发。

Language:PythonApache-2.0653900

leedl-tutorial

《李宏毅深度学习教程》（李宏毅老师推荐👍），PDF下载地址：https://github.com/datawhalechina/leedl-tutorial/releases

Language:Jupyter NotebookNOASSERTION1109000

book-text-to-speech

A book about Text-to-Speech (TTS) in Chinese.

Language:TeXApache-2.056000

ATST-SED

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Language:Jupyter NotebookMIT6200

audioset-downloader

cli to download examples of a specific class from google's AudioSet

Language:PythonMIT500

audioset-processing

Toolkit for downloading and processing Google's AudioSet dataset.

Language:Jupyter NotebookMIT15200

sound-event-detection

Language:Python300