Alexanda's repositories
Voice-Recognition-to-Text-Tool-
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
auto_labeling_for_BERT_VITS2
这个项目是数据预处理。第一步是对获取到的音频做处理,结合Funasr的时间戳去掉空背景音。也包含了喂给BERT前的label
Automatic_Speech_Annotator
Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automatic speech recognition
bulk_transcribe_youtube_videos_from_playlist
Easily take an entire YouTube playlist and turn it into high quality transcripts using Whisper.
CapsWriter-Offline
CapsWriter 的离线版,一个好用的 PC 端的语音输入工具
ChatPaper
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
Chenyme-AAVT-
这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。
ctc-forced-aligner
Text to speech alignment using CTC forced alignment
Dataset_Generator_For_VITS
基于达摩院视频切割技术的视频转换为短音频的vits数据集生成工具 A VITS Dataset Generation Tool for Converting Video to Short Audio Based on Damo Academy Video Cutting Technology
ears_dataset
Expressive Anechoic Recordings of Speech (EARS)
emotion2vec
Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
faster-whisper-GUI
faster_whisper GUI with PySide6
Galgame-Engine-Collect
关于视觉小说的一切,争取打造全网最全的资料库
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
leedl-tutorial
《李宏毅深度学习教程》(李宏毅老师推荐👍),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
MakeDiffSinger
Pipelines and tools to build your own DiffSinger dataset.
MediaCrawler
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫
ParaClipper
一款基于FunASR高准确率开源语音识别模型的自动化视频剪辑工具/A video clipping tool based on FunASR open source ASR model.
Pink-Trombone
A programmable version of Neil Thapen's Pink Trombone
PyQt-Fluent-Widgets
A fluent design widgets library based on C++ Qt/PyQt/PySide. Make Qt Great Again.
pyvideotrans
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
SpeechTasks
This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent speech tool development, and speech applications.
TTS-for-GPT-soVITS
这是一个简单的TTS后端项目 基于https://github.com/RVC-Boss/GPT-SoVITS 并提供了一些推理优化的特性/This is a simple TTS backend project based on https://github.com/RVC-Boss/GPT-SoVITS and provides some inference optimization features:
whisper-web
ML-powered speech recognition directly in your browser
Whisper-WebUI
A Web UI for easy subtitle using whisper model.
X-AnyLabeling
Effortless data labeling with AI support from Segment Anything and other awesome models.