Dalu Feng's repositories
FacePose_pytorch
🔥🔥The pytorch implement of the head pose estimation(yaw,roll,pitch) and emotion detection with SOTA performance in real time.Easy to deploy, easy to use, and high accuracy.Solve all problems of face detection at one time.(极简,极快,高效是我们的宗旨)
Lipreading_using_Temporal_Convolutional_Networks
ICASSP'20 Lipreading using Temporal Convolutional Networks
auto_avsr
Auto-AVSR: Lip-Reading Sentences Project
av_hubert
A self-supervised learning framework for audio-visual speech
awesome-audio-visualization
A curated list about Audio Visualization.
Awesome-Video-Datasets
Video datasets
bark
🔊 Text-Prompted Generative Audio Model
chinese_text_normalization
Chinese text normalization for speech processing
DeepFaceLab
DeepFaceLab is the leading software for creating deepfakes.
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
lightning-bolts
Toolbox of models, callbacks, and datasets for AI/ML researchers.
LRW_ID
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Padding" (ECCV 2022)
mdistiller
The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679
nvjpeg-python
nvjpeg for python
RGB_HSV_HSL
a pure pytorch implementation of color space conversion, including rgb2hsl, rgb2hsv, hsv2rgb, hsl2rgb
Speech-Transformer
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
stanfordacm
Stanford ACM-ICPC related materials
torchnvjpeg
Decode JPEG image on GPU using PyTorch
Wave-U-Net-for-Speech-Enhancement
Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
whisper
Robust Speech Recognition via Large-Scale Weak Supervision
whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)