yiyunchen

Yiyun Chen's starred repositories

LivePortrait

Bring portraits to life!

Language:PythonNOASSERTION1197400

video-diffusion-pytorch

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

Language:PythonMIT122500

NC-SDEdit

[ECCV 2024] Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

7600

awesome-diffusion-v2v

Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.

Language:PythonMIT11600

GPV

Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper

Language:PythonGPL-3.014000

Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.

Language:Python19500

w2v2_audioFrameClassification

wav2vec2 audio classification for prosodic boundary detection and other tasks

Language:Jupyter NotebookMIT3200

speechbrain-docs-zh-cn

SpeechBrain中文文档

1200

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonApache-2.0860900

ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Language:Jupyter NotebookBSD-3-Clause111800

CLAP

Contrastive Language-Audio Pretraining

Language:PythonCC0-1.0134700

madmom

Python audio and music signal processing library

Language:PythonNOASSERTION130600

MidiTok

MIDI / symbolic music tokenizers for Deep Learning models 🎶

Language:PythonMIT66200

Auto_Cut_Audio

We always have a lot of wav audio to cut,and sometimes we need to cut them and we don't want to cut off a word or a complete sentence in audio.

Language:PythonGPL-3.01100

Add_noise_and_rir_to_speech

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT Speech@FIT Reverb Database.

Language:PythonMIT2800

spleeter

Deezer source separation library including pretrained models.

Language:PythonMIT2568000

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

37000

ATST-SED

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Language:Jupyter NotebookMIT8100

speech-vad-demo

集成Webrtc的VAD，用于切分音频文件

Language:C33600

IncrementalVHD_GPE

official code for paper: Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark

Language:Python2600

chorus-detection

A deep learning project for automated chorus detection in songs, featuring a command-line interface (CLI) tool that allows users to input a YouTube link and utilize a pre-trained CRNN model to detect chorus sections from a song on YouTube

Language:Jupyter Notebook1100

HTS-Audio-Transformer

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Language:PythonMIT34700

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT1959800

musiclm-pytorch

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

Language:PythonMIT312600

deep-audio-fingerprinting

A repository for my MSc thesis in Data Science & Machine Learning @ NTUA. A deep learning approach to audio fingerprinting for recognizing songs on real time through the microphone.

Language:Jupyter NotebookMIT1200

VTG-LLM

[Preprint] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Language:PythonApache-2.04900

VFIformer

Video Frame Interpolation with Transformer (CVPR2022)

Language:PythonMIT11200

Video-Frame-Interpolation-Rankings-and-Video-Deblurring-Rankings

Rankings include: ABME AdaFNIO ALANET AMT BiT BVFI CDFI CtxSyn DBVI DeMFI DQBC DRVI EAFI EBME EDC EDENVFI EDSC EMA-VFI FGDCN FILM FLAVR H-VFI IFRNet IQ-VFI JNMR LADDER M2M MA-GCSPA NCM PerVFI PRF ProBoost-Net RIFE RN-VFI SoftSplat SSR ST-MFNet Swin-VFI TDPNet TTVFI UGFI UPR-Net UTI-VFI VFIformer VFIFT VFIMamba VFIT VIDUE VRT

9900

ComfyUI_omost

ComfyUI implementation of Omost

Language:PythonApache-2.040700

OMG-Seg

OMG-LLaVA and OMG-Seg codebase

Language:PythonNOASSERTION123600