macroustc

macroustc

Geek Repo

Github PK Tool:Github PK Tool

macroustc's repositories

OpenVoice

Instant voice cloning

Language:PythonLicense:NOASSERTIONStargazers:1Issues:0Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

audino

Open source audio annotation tool for humans

License:MITStargazers:0Issues:0Issues:0

AudioLDM2

Text-to-Audio/Music Generation

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

License:MITStargazers:0Issues:0Issues:0

Awesome-Video-Diffusion-Models

[Arxiv] A Survey on Video Diffusion Models

Stargazers:0Issues:0Issues:0

Bert-VITS2

vits2 backbone with bert

License:AGPL-3.0Stargazers:0Issues:0Issues:0

DeepLearningSystem

Deep Learning System core principles introduction.

License:Apache-2.0Stargazers:0Issues:0Issues:0

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

License:Apache-2.0Stargazers:0Issues:0Issues:0

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

License:Apache-2.0Stargazers:0Issues:0Issues:0

fish-speech

Brand new TTS solution

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

License:MITStargazers:0Issues:0Issues:0

jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

License:NOASSERTIONStargazers:0Issues:0Issues:0

LLaSM

第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

llm-paper-daily

Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个

Stargazers:0Issues:0Issues:0

minisora

The Mini Sora project aims to explore the implementation path and future development direction of Sora.

License:Apache-2.0Stargazers:0Issues:0Issues:0

NISQA

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Open-Sora

Building your own video generation model like OpenAI's Sora

License:Apache-2.0Stargazers:0Issues:0Issues:0

Open-Sora-Plan

This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:0Issues:0

piper

A fast, local neural text to speech system

Language:C++License:MITStargazers:0Issues:0Issues:0

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

License:NOASSERTIONStargazers:0Issues:0Issues:0

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

License:NOASSERTIONStargazers:0Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

License:NOASSERTIONStargazers:0Issues:0Issues:0

Speech-Resources

语音方向实验室/公司/资源/实习等,欢迎推荐或自荐

Stargazers:0Issues:0Issues:0

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

License:MITStargazers:0Issues:0Issues:0

torchcrepe

Pytorch implementation of the CREPE pitch tracker

License:MITStargazers:0Issues:0Issues:0

ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.

License:MITStargazers:0Issues:0Issues:0

UniAudio

The Open Source Code of UniAudio

Stargazers:0Issues:0Issues:0

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

License:MITStargazers:0Issues:0Issues:0

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

License:NOASSERTIONStargazers:0Issues:0Issues:0