rohithkodali's repositories
transformer-cnn-emotion-recognition
Speech Emotion Classification with novel Parallel CNN-Transformer model built with PyTorch, plus thorough explanations of CNNs, Transformers, and everything in between
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
ConsistencyVC-voive-conversion
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
conv-emotion
This repo contains implementation of different architectures for emotion recognition in conversations
Deep-Learning-in-Production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
FastSAM
Fast Segment Anything
langdetect
langauge detection algorithm that can be expandable to add any number of languages
LookOnceToHear
A novel human-interaction method for real-time speech extraction on headphones.
melgan-neurips
GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
MLnotebook
Understanding Deep Learning - Simon J.D. Prince
Nepali-Ai-Anchor
Nepali AI Anchor Using LSTM & Pix2Pix. [ Itonics Hackathon 2019]
open-speech-corpora
A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Real-time-wake-word-detection
Spoken wake-word detection for conversational avatar
recurrent-interface-network-pytorch
Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in Pytorch
Resemblyzer
A python package to analyze and compare voices with deep learning
self-supervised-phone-segmentation
Phoneme segmentation using pre-trained speech models
supervoice-dataset
60k hours of phoneme-aligned audio from audio books
voice-activity-detection
Voice Activity Detection (VAD) using deep learning.
VoskIdentification
Тестовый пример задействования модели для идентификации голоса с помощью библиотеки распознавания речи "Vosk" (Воск): https://alphacephei.com/vosk/
Whisper-Hindi-ASR-model-IIT-Bombay-Intership
The Whisper Hindi ASR (Automatic Speech Recognition) model utilizes the KathBath dataset, a comprehensive collection of speech samples in Hindi. Trained on this dataset, Whisper employs advanced deep learning techniques to accurately transcribe spoken Hindi into text.
whisper-to-normal-speech-conversion
Whisper-to-Normal Speech Conversion Using Generative Adversarial Networks