Haider Asad's repositories
triton_trtllm_guide
Installation and usage guide for Triton TRT-LLM
tabular_data_extraction
A repo utilizing Document table extraction models and serving it as a standalone API
deepdoctection
A Repo For Document AI
doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Lip_Wise
LipWise is a powerful video dubbing tool that leverages optimized inference for Wav2Lip, this also utilizes models like GFPGAN and CodeFormer. These sophisticated models seamlessly integrate the new audio with the lip movements of the reference video, resulting in a stunningly natural and realistic final output.
CodeFormer
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
whisper.cpp
Port of OpenAI's Whisper model in C/C++
image-matching-webui
🤗 image matching toolbox webui
VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
bark
🔊 Text-Prompted Generative Audio Model
faster-whisper
Faster Whisper transcription with CTranslate2
whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
modal-examples
Examples of programs built using Modal
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and FastChat-T5.
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
quillman
A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.
Turkish-Text-to-Speech
Speech synthesis (TTS) in low-resource languages by training from scratch with Fastpitch and fine-tuning with HifiGan
speaker-transcription
Transcription with speaker diarization pipeline
text2speech
Towards Building Text-To-Speech Systems for the Next Billion Users - Microsoft Research Intern Work - Accepted at ICASSP 2023
google-research
Google Research
unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
multilingual_kws
Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
camelot
Camelot: PDF Table Extraction for Humans
mycroft-precise
A lightweight, simple-to-use, RNN wake word listener
PronouncUR
PronouncUR: An Urdu Pronunciation Lexicon Generator