rohithkodali's repositories

transformer-cnn-emotion-recognition

Speech Emotion Classification with novel Parallel CNN-Transformer model built with PyTorch, plus thorough explanations of CNNs, Transformers, and everything in between

Language:Jupyter NotebookLicense:MITStargazers:1Issues:1Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

ConsistencyVC-voive-conversion

Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

conv-emotion

This repo contains implementation of different architectures for emotion recognition in conversations

License:MITStargazers:0Issues:0Issues:0

ddsp

DDSP: Differentiable Digital Signal Processing

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

Stargazers:0Issues:0Issues:0

espresso

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

FastSAM

Fast Segment Anything

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

langchain

⚡ Building applications with LLMs through composability ⚡

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

langdetect

langauge detection algorithm that can be expandable to add any number of languages

Language:PythonLicense:Apache-2.0Stargazers:0Issues:2Issues:0

LookOnceToHear

A novel human-interaction method for real-time speech extraction on headphones.

License:NOASSERTIONStargazers:0Issues:0Issues:0

melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

MLnotebook

Understanding Deep Learning - Simon J.D. Prince

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:0Issues:0

Nepali-Ai-Anchor

Nepali AI Anchor Using LSTM & Pix2Pix. [ Itonics Hackathon 2019]

Language:PythonStargazers:0Issues:1Issues:0

open-speech-corpora

A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stargazers:0Issues:1Issues:0

pifuhd

High-Resolution 3D Human Digitization from A Single Image.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

Real-time-wake-word-detection

Spoken wake-word detection for conversational avatar

Stargazers:0Issues:0Issues:0

recurrent-interface-network-pytorch

Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in Pytorch

License:MITStargazers:0Issues:0Issues:0

Resemblyzer

A python package to analyze and compare voices with deep learning

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

self-supervised-phone-segmentation

Phoneme segmentation using pre-trained speech models

Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0

StyleTTS

Official Implementation of StyleTTS

License:MITStargazers:0Issues:0Issues:0

supervoice-dataset

60k hours of phoneme-aligned audio from audio books

Language:PythonStargazers:0Issues:0Issues:0

TOI

Toi news

Language:PythonStargazers:0Issues:2Issues:0
License:CC-BY-4.0Stargazers:0Issues:0Issues:0

vall-e

An unofficial PyTorch implementation of the audio LM VALL-E, WIP

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

voice-activity-detection

Voice Activity Detection (VAD) using deep learning.

License:GPL-3.0Stargazers:0Issues:0Issues:0

VoskIdentification

Тестовый пример задействования модели для идентификации голоса с помощью библиотеки распознавания речи "Vosk" (Воск): https://alphacephei.com/vosk/

Language:JavaStargazers:0Issues:0Issues:0

Whisper-Hindi-ASR-model-IIT-Bombay-Intership

The Whisper Hindi ASR (Automatic Speech Recognition) model utilizes the KathBath dataset, a comprehensive collection of speech samples in Hindi. Trained on this dataset, Whisper employs advanced deep learning techniques to accurately transcribe spoken Hindi into text.

License:EPL-2.0Stargazers:0Issues:0Issues:0

whisper-to-normal-speech-conversion

Whisper-to-Normal Speech Conversion Using Generative Adversarial Networks

Language:PythonLicense:MITStargazers:0Issues:1Issues:0
License:MITStargazers:0Issues:0Issues:0