haiderasad

Haider Asad's repositories

triton_trtllm_guide

Installation and usage guide for Triton TRT-LLM

000

tabular_data_extraction

A repo utilizing Document table extraction models and serving it as a standalone API

Language:Python000

doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Apache-2.0000

LipWise is a powerful video dubbing tool that leverages optimized inference for Wav2Lip, this also utilizes models like GFPGAN and CodeFormer. These sophisticated models seamlessly integrate the new audio with the lip movements of the reference video, resulting in a stunningly natural and realistic final output.

Apache-2.0000

CodeFormer

[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer

NOASSERTION000

GPEN

000

whisper.cpp

Port of OpenAI's Whisper model in C/C++

MIT000

image-matching-webui

🤗 image matching toolbox webui

000

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

MIT000

bark

🔊 Text-Prompted Generative Audio Model

MIT000

faster-whisper

Faster Whisper transcription with CTranslate2

MIT000

whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

BSD-2-Clause000

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

MIT000

whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

BSD-4-Clause000

modal-examples

Examples of programs built using Modal

MIT000

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and FastChat-T5.

Apache-2.0000

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

MIT000

vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

MIT000

quillman

A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

MIT000