Karthik Ganesan's repositories
AudioDec
An Open-source Streaming High-fidelity Neural Audio Codec
codingInterview
coding interview brushup
denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
espnet_onnx
Onnx wrapper for espnet infrernce model
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
hexa
Discovering and Achieving Goals via World Models, NeurIPS 2021
LongLoRA
Code and documents of LongLoRA and LongAlpaca
MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
NexusRaven
NexusRaven-13B, a new SOTA Open-Source LLM for function calling. This repo contains everything for reproducing our evaluation on NexusRaven-13B and baselines.
Retrieval-based-Voice-Conversion-WebUI
Voice data <= 10 mins can also be used to train a good VC model!
s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit.
sharedtask-dialdoc2021
doc2dial data includes a set of documents from multiple domains; and conversations between an assisting agent and an end user that are grounded in the associated documents.
soundstorm-speechtokenizer
Implementation of SoundStorm built upon SpeechTokenizer.
SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
vocode-python
🤖 Build voice-based LLM agents. Modular + open source.
WCN-BERT
Jointly encoding word confusion networks (WCNs) and dialogue context with BERT for spoken language understanding (SLU).
zeno-build
Build, evaluate, analyze, and understand LLM-based apps