Pingchuan Ma (mpc001)

mpc001

Geek Repo

Location:Imperial College London

Home Page:mpc001.github.io

Github PK Tool:Github PK Tool

Pingchuan Ma's starred repositories

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:10510Issues:139Issues:328

faster-whisper

Faster Whisper transcription with CTranslate2

Language:PythonLicense:MITStargazers:10014Issues:121Issues:594

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Language:PythonLicense:NOASSERTIONStargazers:9885Issues:132Issues:48

ImageBind

ImageBind One Embedding Space to Bind Them All

Language:PythonLicense:NOASSERTIONStargazers:8050Issues:100Issues:83

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonLicense:NOASSERTIONStargazers:4463Issues:50Issues:894

AgentVerse

🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation

Language:JavaScriptLicense:Apache-2.0Stargazers:3854Issues:58Issues:76

zero123

Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)

Language:PythonLicense:MITStargazers:2571Issues:43Issues:120

GeneFace

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code

Language:PythonLicense:MITStargazers:2429Issues:50Issues:277

INTERSPEECH-2023-Papers

INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!

MultiMAE

MultiMAE: Multi-modal Multi-task Masked Autoencoders, ECCV 2022

Language:PythonLicense:NOASSERTIONStargazers:533Issues:13Issues:31

muavic

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Language:PythonLicense:NOASSERTIONStargazers:341Issues:14Issues:20

CVPR-2023-24-Papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

Language:PythonLicense:MITStargazers:332Issues:8Issues:0

ICASSP-2023-24-Papers

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Language:PythonLicense:MITStargazers:273Issues:27Issues:3

MegaPortraits

Supplementary materials for paper MegaPortraits [ACMM22]

DSFD-Pytorch-Inference

A High-Performance Pytorch Implementation of face detection models, including RetinaFace and DSFD

Language:PythonLicense:Apache-2.0Stargazers:213Issues:4Issues:28

cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

Language:PythonLicense:BSD-2-ClauseStargazers:212Issues:5Issues:28

Depth-Enhancement-and-Super-Resolution

Towards Unpaired Depth Enhancement and Super-Resolution in the Wild paper code

Language:Jupyter NotebookStargazers:56Issues:1Issues:0

Leaf-diseases-segmentation

Finale project of Deep Learning course

Language:Jupyter NotebookStargazers:53Issues:1Issues:0

LipLearner

Research repository for LipLearner: Customizable Silent Speech Interactions on Mobile Devices (CHI 2023).

Language:SwiftLicense:MITStargazers:53Issues:6Issues:0

raven

Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)

Language:PythonLicense:MITStargazers:48Issues:8Issues:7

Lenta-Hackathon

Code and files for skoltech/lenta hackaton sept.2020

Language:Jupyter NotebookStargazers:38Issues:1Issues:0

AV-RelScore

Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23

Multi-head-Visual-Audio-Memory

PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)

Language:PythonLicense:NOASSERTIONStargazers:21Issues:1Issues:5

CNVSRC2023Baseline

Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)

Language:PythonLicense:NOASSERTIONStargazers:20Issues:1Issues:2
Language:PythonStargazers:14Issues:0Issues:0

papers-to-read

Main articles I read or plan to read, as well as useful links.

License:MITStargazers:8Issues:0Issues:0
Language:Jupyter NotebookStargazers:8Issues:0Issues:0

skoltech_NLA

Numerical linear algebra course in Skoltech 2020

Language:Jupyter NotebookStargazers:8Issues:0Issues:0
Language:SwiftLicense:MITStargazers:5Issues:0Issues:0