shiyuzh2007's repositories
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
3D-Speaker
A repository for single- and multi-modal speaker verification, speaker recognition and speaker diarization.
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
audioldm_eval
This toolbox aims to unify audio generation model evaluation for easier comparison.
AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Bert-VITS2
vits2 backbone with bert
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models.
langchain
⚡ Building applications with LLMs through composability ⚡
langflow
⛓️ Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.
llama
Inference code for LLaMA models
magvit
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
OOTDiffusion
Official implementation of OOTDiffusion
Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
PALM-E
Implementation of "PaLM-E: An Embodied Multimodal Language Model"
ParroT
The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.
Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
safe-rlhf
Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
stable-diffusion-webui
Stable Diffusion web UI
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs