Haider Asad's repositories
bark
🔊 Text-Prompted Generative Audio Model
camelot
Camelot: PDF Table Extraction for Humans
CodeFormer
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
deepdoctection
A Repo For Document AI
doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and FastChat-T5.
faster-whisper
Faster Whisper transcription with CTranslate2
google-research
Google Research
image-matching-webui
🤗 image matching toolbox webui
Lip_Wise
LipWise is a powerful video dubbing tool that leverages optimized inference for Wav2Lip, this also utilizes models like GFPGAN and CodeFormer. These sophisticated models seamlessly integrate the new audio with the lip movements of the reference video, resulting in a stunningly natural and realistic final output.
modal-examples
Examples of programs built using Modal
multilingual_kws
Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
mycroft-precise
A lightweight, simple-to-use, RNN wake word listener
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
PronouncUR
PronouncUR: An Urdu Pronunciation Lexicon Generator
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
quillman
A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
speaker-transcription
Transcription with speaker diarization pipeline
tabular_data_extraction
A repo utilizing Document table extraction models and serving it as a standalone API
text2speech
Towards Building Text-To-Speech Systems for the Next Billion Users - Microsoft Research Intern Work - Accepted at ICASSP 2023
Turkish-Text-to-Speech
Speech synthesis (TTS) in low-resource languages by training from scratch with Fastpitch and fine-tuning with HifiGan
unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
whisper.cpp
Port of OpenAI's Whisper model in C/C++
whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)