Yiyun Chen (yiyunchen)

yiyunchen

Geek Repo

Location:Shengzhen, China

Github PK Tool:Github PK Tool

Yiyun Chen's starred repositories

LivePortrait

Bring portraits to life!

Language:PythonLicense:NOASSERTIONStargazers:11974Issues:0Issues:0

video-diffusion-pytorch

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

Language:PythonLicense:MITStargazers:1225Issues:0Issues:0

NC-SDEdit

[ECCV 2024] Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

Stargazers:76Issues:0Issues:0

awesome-diffusion-v2v

Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.

Language:PythonLicense:MITStargazers:116Issues:0Issues:0

GPV

Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper

Language:PythonLicense:GPL-3.0Stargazers:140Issues:0Issues:0

VADER

Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.

Language:PythonStargazers:195Issues:0Issues:0

w2v2_audioFrameClassification

wav2vec2 audio classification for prosodic boundary detection and other tasks

Language:Jupyter NotebookLicense:MITStargazers:32Issues:0Issues:0

speechbrain-docs-zh-cn

SpeechBrain中文文档

Stargazers:12Issues:0Issues:0

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:8609Issues:0Issues:0

ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:1118Issues:0Issues:0

CLAP

Contrastive Language-Audio Pretraining

Language:PythonLicense:CC0-1.0Stargazers:1347Issues:0Issues:0

madmom

Python audio and music signal processing library

Language:PythonLicense:NOASSERTIONStargazers:1306Issues:0Issues:0

MidiTok

MIDI / symbolic music tokenizers for Deep Learning models 🎶

Language:PythonLicense:MITStargazers:662Issues:0Issues:0

Auto_Cut_Audio

We always have a lot of wav audio to cut,and sometimes we need to cut them and we don't want to cut off a word or a complete sentence in audio.

Language:PythonLicense:GPL-3.0Stargazers:11Issues:0Issues:0

Add_noise_and_rir_to_speech

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT Speech@FIT Reverb Database.

Language:PythonLicense:MITStargazers:28Issues:0Issues:0

spleeter

Deezer source separation library including pretrained models.

Language:PythonLicense:MITStargazers:25680Issues:0Issues:0

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Stargazers:370Issues:0Issues:0

ATST-SED

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Language:Jupyter NotebookLicense:MITStargazers:81Issues:0Issues:0

speech-vad-demo

集成Webrtc的VAD,用于切分音频文件

Language:CStargazers:336Issues:0Issues:0

IncrementalVHD_GPE

official code for paper: Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark

Language:PythonStargazers:26Issues:0Issues:0

chorus-detection

A deep learning project for automated chorus detection in songs, featuring a command-line interface (CLI) tool that allows users to input a YouTube link and utilize a pre-trained CRNN model to detect chorus sections from a song on YouTube

Language:Jupyter NotebookStargazers:11Issues:0Issues:0

HTS-Audio-Transformer

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Language:PythonLicense:MITStargazers:347Issues:0Issues:0

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonLicense:MITStargazers:19598Issues:0Issues:0

musiclm-pytorch

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

Language:PythonLicense:MITStargazers:3126Issues:0Issues:0

deep-audio-fingerprinting

A repository for my MSc thesis in Data Science & Machine Learning @ NTUA. A deep learning approach to audio fingerprinting for recognizing songs on real time through the microphone.

Language:Jupyter NotebookLicense:MITStargazers:12Issues:0Issues:0

VTG-LLM

[Preprint] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Language:PythonLicense:Apache-2.0Stargazers:49Issues:0Issues:0

VFIformer

Video Frame Interpolation with Transformer (CVPR2022)

Language:PythonLicense:MITStargazers:112Issues:0Issues:0

Video-Frame-Interpolation-Rankings-and-Video-Deblurring-Rankings

Rankings include: ABME AdaFNIO ALANET AMT BiT BVFI CDFI CtxSyn DBVI DeMFI DQBC DRVI EAFI EBME EDC EDENVFI EDSC EMA-VFI FGDCN FILM FLAVR H-VFI IFRNet IQ-VFI JNMR LADDER M2M MA-GCSPA NCM PerVFI PRF ProBoost-Net RIFE RN-VFI SoftSplat SSR ST-MFNet Swin-VFI TDPNet TTVFI UGFI UPR-Net UTI-VFI VFIformer VFIFT VFIMamba VFIT VIDUE VRT

Stargazers:99Issues:0Issues:0

ComfyUI_omost

ComfyUI implementation of Omost

Language:PythonLicense:Apache-2.0Stargazers:407Issues:0Issues:0

OMG-Seg

OMG-LLaVA and OMG-Seg codebase

Language:PythonLicense:NOASSERTIONStargazers:1236Issues:0Issues:0