Beast code in Giters

Ningwei's starred repositories

MARLIN

[CVPR] MARLIN: Masked Autoencoder for facial video Representation LearnINg

Language:PythonNOASSERTION21000

2D3MF

Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"

Language:PythonNOASSERTION2800

InstantMesh

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

Language:PythonApache-2.0276000

AVT2-DWF

AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and Dynamic Weighting Strategies

Language:Python500

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Language:PythonMIT1896900

grad-cam

[ICCV 2017] Torch code for Grad-CAM

Language:Lua146200

VAED_HeterGraph

The implementation for Interspeech22 "Visually-aware Acoustic Event Detection using Heterogeneous Graphs" paper

Language:Python1000

LipFD

This repository contains the codes of "Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-syncing DeepFakes".

Language:Python6200

Fast-Poisson-Image-Editing

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Language:PythonMIT24200

fast-poisson-image-editing

Fast, scalable, and extensive implementations of Poisson image editing algorithms.

Language:Python3900

CFLD

[CVPR 2024 Highlight] Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Language:Jupyter NotebookMIT15300

SAiD

SAiD: Blendshape-based Audio-Driven Speech Animation with Diffusion

Language:PythonApache-2.06200

faster-SadTalker-API

The API server version of the SadTalker project. Runs in Docker, 10 times faster than the original!

Language:PythonMIT11000

VOODOO3D-official

Official implementation for the paper "VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment"

Language:PythonMIT12600

SadTalker

[CVPR 2023] SadTalker：Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Language:PythonNOASSERTION1127800

GP-VTON

Official Implementation for CVPR2023 paper "GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning"

Language:Python45200

SadTalker-Video-Lip-Sync

本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形，设置面部区域可配置的增强方式进行合成唇形（人脸）区域画面增强，提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧，补充帧间合成唇形的动作过渡，使合成的唇形更为流畅、真实以及自然。

Language:Python174000

dreamtalk

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Language:PythonMIT149400

AnimateAnyone

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Apache-2.01415800

GaussianAvatar

[CVPR 2024] The official repo for "GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians"

Language:PythonMIT36100

Gaussian-Head-Avatar

[CVPR 2024] Official repository for "Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians"

Language:PythonNOASSERTION71500

gaussian-head

Official repository for 'GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation'

Language:PythonMIT24200

AnimatableGaussians

Code of [CVPR 2024] "Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling"

Language:PythonNOASSERTION82700

VividTalk

VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior

Apache-2.074900

minddiffusion

A collection of diffusion models based on MindSpore

Language:PythonApache-2.015800

GFPGAN

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

Language:PythonNOASSERTION3516000

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

Language:PythonNOASSERTION161600

ER-NeRF

[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

Language:PythonMIT91400

JuneoXIE