zwglory

followers

following

stars

University of Chinese Academy of Science

Beijing in China

Zhouwei's repositories

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION300

chatgpt_academic

科研工作专用ChatGPT拓展，特别优化学术Paper润色体验，支持自定义快捷按钮，支持markdown表格显示，Tex公式双显示，代码显示功能完善，新增本地Python工程剖析功能/自我剖析功能

Language:PythonGPL-3.0100

WechatBot

Language:TypeScript1 10

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language:C++Apache-2.0000

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

MIT000

bark

🔊 Text-Prompted Generative Audio Model

Language:PythonNOASSERTION000

EmoGator

Apache-2.0000

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonApache-2.0000

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonMIT000

gpt4free

decentralising the Ai Industry, just some language model api's...

GPL-3.0000

InstantID

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

Apache-2.0000

MediaCrawler

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频｜评论爬虫、微博帖子｜评论爬虫

Language:PythonNOASSERTION000

MOSS

An open-source tool-augmented conversational language model from Fudan University

Language:PythonApache-2.0000

MuseV

MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising

MIT000

NISQA

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Language:PythonMIT000

OOTDiffusion

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

NOASSERTION000

OpenVoice

Instant voice cloning by MyShell.

NOASSERTION000

ProDiff

PyTorch Implementation of ProDiff (ACM-MM'22) with a Extremely-Fast diffusion speech synthesis pipeline

MIT000

prompt-to-prompt

Language:Jupyter NotebookApache-2.0000

QAnything

Question and Answer based on Anything.

Apache-2.0000

RealtimeTTS

Converts text to speech in realtime

000

roomGPT

Upload a photo of your room to generate your dream room with AI.

Language:TypeScript000

sd-webui-EasyPhoto

📷 EasyPhoto | Your Smart AI Photo Generator.

Apache-2.0000

so-vits-svc

SoftVC VITS Singing Voice Conversion

Language:PythonBSD-3-Clause000

so-vits-svc-5.0

Core Engine of Singing Voice Conversion & Singing Voice Clone

Language:PythonMIT000

ssr_eval

Evaluation and Benchmarking of Speech Super-resolution Methods

Language:Python000

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonAGPL-3.0000

Telechat

000

video-subtitle-extractor

视频硬字幕提取，无需申请第三方API，本地实现文本识别。基于深度学习(CTPN+CRNN)的视频提取框架，包含字幕区域检测、字幕内容提取

Language:PythonApache-2.0000

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

NOASSERTION000