Beast code in Giters

Pingchuan Ma's starred repositories

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookNOASSERTION10510 139 328

faster-whisper

Faster Whisper transcription with CTranslate2

Language:PythonMIT10014 121 594

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Language:PythonNOASSERTION9885 132 48

ImageBind

ImageBind One Embedding Space to Bind Them All

Language:PythonNOASSERTION8050 100 83

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION4463 50 894

AgentVerse

🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation

Language:JavaScriptApache-2.03854 58 76

zero123

Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)

Language:PythonMIT2571 43 120

GeneFace

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code

Language:PythonMIT2429 50 277

INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!

MIT607 87 4

MultiMAE

MultiMAE: Multi-modal Multi-task Masked Autoencoders, ECCV 2022

Language:PythonNOASSERTION533 13 31

muavic

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Language:PythonNOASSERTION341 14 20

CVPR-2023-24-Papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

Language:PythonMIT332 80

ICASSP-2023-24-Papers

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Language:PythonMIT273 27 3

MegaPortraits

Supplementary materials for paper MegaPortraits [ACMM22]

263 53 6

DSFD-Pytorch-Inference

A High-Performance Pytorch Implementation of face detection models, including RetinaFace and DSFD

Language:PythonApache-2.0213 4 28

cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

Language:PythonBSD-2-Clause212 5 28

Depth-Enhancement-and-Super-Resolution

Towards Unpaired Depth Enhancement and Super-Resolution in the Wild paper code

Language:Jupyter Notebook56 10

Leaf-diseases-segmentation

Finale project of Deep Learning course

Language:Jupyter Notebook53 10

LipLearner

Research repository for LipLearner: Customizable Silent Speech Interactions on Mobile Devices (CHI 2023).

Language:SwiftMIT53 60

raven

Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)

Language:PythonMIT48 8 7

Lenta-Hackathon

Code and files for skoltech/lenta hackaton sept.2020

Language:Jupyter Notebook38 10

AV-RelScore

Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23

Language:Python26 2 1