Nickolay V. Shmyrev's starred repositories
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
TalkingHead
Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
audioset-processing
Toolkit for downloading and processing Google's AudioSet dataset.
TextyMcSpeechy
Easily create text-to-speech models in any voice for rhasspy/piper. Make a text-to-speech model with your own voice recordings, or use thousands of RVC voices. Works offline on a Raspberry pi.
audio-preprocessing-scripts
数据集自动化制作脚本
UnitySpeechToText
A native Unity plugin to convert speech to text on Android & iOS
FlashSpeech
FlashSpeech: Efficient Zero-Shot Speech Synthesis
audio_diarization_annotation
Audio Diarization Annotation tool
gazelle-train
Joint speech-language model - respond directly to audio!
nlp-rus-zaliz
Processing the grammar dictionary of A. A. Zaliznyak for morphological inflection
MINETrans-IWSLT23
Official implementation of our IWSLT 2023 paper "The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks"
supervoice-enhance
Supervoice diffusion enhance
minTorToiSe
A minimal PyTorch re-implementation of TorToiSe-tts inference
speech_evaluation
A toolkit dedicate for speech evaluation.
tortoise-tts
A multi-voice TTS system trained with an emphasis on quality