haiderasad

Haider Asad's starred repositories

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION393400

Audio-and-text-based-emotion-recognition

A multimodal approach on emotion recognition using audio and text.

Language:Jupyter NotebookApache-2.014200

faiss

A library for efficient similarity search and clustering of dense vectors.

Language:C++MIT2869300

mistral-inference

Official inference library for Mistral models

Language:Jupyter NotebookApache-2.0885300

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonApache-2.053000

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonApache-2.01166100

WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Language:Jupyter NotebookMIT20000

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0694500

ctranslate2_triton_backend

Triton backend for https://github.com/OpenNMT/CTranslate2

Language:C++MIT2800

WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Language:Jupyter NotebookMIT346500

camelot

Camelot: PDF Table Extraction for Humans

Language:PythonNOASSERTION358500

dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

Language:PythonApache-2.08900

Table-Detection-Extraction

Detect the tables in a form and extract the tables as well as the cells of the tables.

Language:PythonMIT5500

doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Language:PythonApache-2.0317700

deepdoctection

A Repo For Document AI

Language:PythonApache-2.0226900

CascadeTabNet

This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"

Language:PythonMIT144800

Multi-Type-TD-TSR

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Language:Jupyter NotebookMIT24200

OCR_tablenet

TableNet Implementation on Pytorch

Language:Python14500

server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Language:PythonBSD-3-Clause751600

awesome-faceReenactment

papers about Face Reenactment/Talking Face Generation

42700

Wav2Lip-GFPGAN

High quality Lip sync

Language:Python93400

CodeFormer

[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer

Language:PythonNOASSERTION1369400

Lip_Wise

Orchestrating AI for stunning lip-synced videos. Effortless workflow, exceptional results, all in one place.

Language:PythonApache-2.04200

DINet

The source code of "DINet: deformation inpainting network for realistic face visually dubbing on high resolution video."

Language:Python86500

Auto-Synced-Translated-Dubs

Automatically translates the text of a video based on a subtitle file, and also uses AI voice to dub the video, and synced using the subtitle's timings

Language:PythonGPL-3.0151000

pydub

Manipulate audio with a simple and easy high level interface

Language:PythonMIT844300

video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Language:PythonApache-2.04700

DPE

[CVPR 2023] DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

Language:PythonMIT40500

T2M-GPT

(CVPR 2023) Pytorch implementation of “T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations”

Language:PythonApache-2.053600