haiderasad

followers

following

stars

Pakistan

Haider Asad's repositories

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT000

camelot

Camelot: PDF Table Extraction for Humans

Language:PythonNOASSERTION000

CodeFormer

[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer

Language:PythonNOASSERTION000

deepdoctection

A Repo For Document AI

Apache-2.0000

doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Apache-2.0000

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and FastChat-T5.

Apache-2.0000

faster-whisper

Faster Whisper transcription with CTranslate2

MIT000

google-research

Google Research

Apache-2.0000

GPEN

000

image-matching-webui

🤗 image matching toolbox webui

000

Lip_Wise

LipWise is a powerful video dubbing tool that leverages optimized inference for Wav2Lip, this also utilizes models like GFPGAN and CodeFormer. These sophisticated models seamlessly integrate the new audio with the lip movements of the reference video, resulting in a stunningly natural and realistic final output.

Apache-2.0000

modal-examples

Examples of programs built using Modal

MIT000

multilingual_kws

Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus

000

mycroft-precise

A lightweight, simple-to-use, RNN wake word listener

Apache-2.0000

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Apache-2.0000

PronouncUR

PronouncUR: An Urdu Pronunciation Lexicon Generator

MIT000

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

MIT000

quillman

A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

MIT000

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

MIT000

speaker-transcription

Transcription with speaker diarization pipeline

MIT000

tabular_data_extraction

A repo utilizing Document table extraction models and serving it as a standalone API

Language:Python000

text2speech

Towards Building Text-To-Speech Systems for the Next Billion Users - Microsoft Research Intern Work - Accepted at ICASSP 2023

000

Turkish-Text-to-Speech

Speech synthesis (TTS) in low-resource languages by training from scratch with Fastpitch and fine-tuning with HifiGan

000

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

MIT000

UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

NOASSERTION000

vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

MIT000

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

MIT000

whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

BSD-2-Clause000

whisper.cpp

Port of OpenAI's Whisper model in C/C++

MIT000

whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

BSD-4-Clause000