Beast code in Giters

shiyuzh2007's repositories

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonMIT100

self-llm

《开源大模型食用指南》基于AutoDL快速部署开源大模型，更适合**宝宝的部署教程

Language:Jupyter NotebookApache-2.0100

3D-Speaker

A repository for single- and multi-modal speaker verification, speaker recognition and speaker diarization.

Apache-2.0000

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

000

audioldm_eval

This toolbox aims to unify audio generation model evaluation for easier comparison.

MIT000

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

MIT000

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

MIT000

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

000

Bert-VITS2

vits2 backbone with bert

AGPL-3.0000

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models.

NOASSERTION000

i-Code

MIT000

langchain

⚡ Building applications with LLMs through composability ⚡

MIT000

langflow

⛓️ Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.

MIT000

llama

Inference code for LLaMA models

NOASSERTION000

magvit

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Apache-2.0000

Make-An-Audio

000

OOTDiffusion

Official implementation of OOTDiffusion

NOASSERTION000

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Apache-2.0000

PALM-E

Implementation of "PaLM-E: An Embodied Multimodal Language Model"

Apache-2.0000

ParroT

The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

000

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Apache-2.0000

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

NOASSERTION000

safe-rlhf

Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Apache-2.0000

SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

Apache-2.0000

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

NOASSERTION000

stable-diffusion-webui

Stable Diffusion web UI

AGPL-3.0000

unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Apache-2.0000

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Apache-2.0000

Video-LLaVA

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Apache-2.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Apache-2.0000