Beast code in Giters

rohithkodali's repositories

transformer-cnn-emotion-recognition

Speech Emotion Classification with novel Parallel CNN-Transformer model built with PyTorch, plus thorough explanations of CNNs, Transformers, and everything in between

Language:Jupyter NotebookMIT1 10

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT000

ConsistencyVC-voive-conversion

Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion

Language:PythonMIT000

conv-emotion

This repo contains implementation of different architectures for emotion recognition in conversations

MIT000

ddsp

DDSP: Differentiable Digital Signal Processing

Language:PythonApache-2.0010

Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

010

espresso

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Language:PythonMIT010

FastSAM

Fast Segment Anything

Language:PythonApache-2.0000

langchain

⚡ Building applications with LLMs through composability ⚡

Language:PythonMIT010

langdetect

langauge detection algorithm that can be expandable to add any number of languages

Language:PythonApache-2.0020

LookOnceToHear

A novel human-interaction method for real-time speech extraction on headphones.

NOASSERTION000

melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

Language:PythonMIT010

MLnotebook

Understanding Deep Learning - Simon J.D. Prince

Language:Jupyter NotebookNOASSERTION000

Nepali-Ai-Anchor

Nepali AI Anchor Using LSTM & Pix2Pix. [ Itonics Hackathon 2019]

Language:Python010

open-speech-corpora

A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

010

pifuhd

High-Resolution 3D Human Digitization from A Single Image.

Language:PythonNOASSERTION010

Real-time-wake-word-detection

Spoken wake-word detection for conversational avatar

Language:Jupyter Notebook010

recurrent-interface-network-pytorch

Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in Pytorch

Language:PythonMIT010

Resemblyzer

A python package to analyze and compare voices with deep learning

Language:PythonApache-2.0010

self-supervised-phone-segmentation

Phoneme segmentation using pre-trained speech models

Language:PythonGPL-3.0000

StyleTTS

Official Implementation of StyleTTS

Language:PythonMIT010

supervoice-dataset

60k hours of phoneme-aligned audio from audio books

Language:Python000

TOI

Toi news

Language:Python020

ULCA-asr-dataset-corpus

CC-BY-4.0010

vall-e

An unofficial PyTorch implementation of the audio LM VALL-E, WIP

Language:PythonMIT010

voice-activity-detection

Voice Activity Detection (VAD) using deep learning.

GPL-3.0000

VoskIdentification

Тестовый пример задействования модели для идентификации голоса с помощью библиотеки распознавания речи "Vosk" (Воск): https://alphacephei.com/vosk/

Language:Java000

Whisper-Hindi-ASR-model-IIT-Bombay-Intership

The Whisper Hindi ASR (Automatic Speech Recognition) model utilizes the KathBath dataset, a comprehensive collection of speech samples in Hindi. Trained on this dataset, Whisper employs advanced deep learning techniques to accurately transcribe spoken Hindi into text.

EPL-2.0000

whisper-to-normal-speech-conversion

Whisper-to-Normal Speech Conversion Using Generative Adversarial Networks

Language:PythonMIT010

you-only-hear-once

MIT000