Dalu Feng (Fengdalu)

Fengdalu

Geek Repo

Github PK Tool:Github PK Tool


Organizations
VIPL-Audio-Visual-Speech-Understanding

Dalu Feng's starred repositories

ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Language:PythonLicense:GPL-3.0Stargazers:51152Issues:381Issues:3199

whisper.cpp

Port of OpenAI's Whisper model in C/C++

so-vits-svc

SoftVC VITS Singing Voice Conversion

Language:PythonLicense:AGPL-3.0Stargazers:25389Issues:177Issues:130

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonLicense:Apache-2.0Stargazers:25222Issues:193Issues:4038

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:21669Issues:182Issues:478

DeOldify

A Deep Learning based project for colorizing and restoring old images (and video!)

Language:PythonLicense:MITStargazers:17949Issues:440Issues:382

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:15897Issues:106Issues:1028

BlackHole

BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.

Language:CLicense:GPL-3.0Stargazers:14943Issues:125Issues:398

gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

Language:PythonLicense:NOASSERTIONStargazers:13713Issues:116Issues:927

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonLicense:BSD-3-ClauseStargazers:8317Issues:100Issues:1179

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

dreamgaussian

[ICLR 2024 Oral] Generative Gaussian Splatting for Efficient 3D Content Creation

Language:PythonLicense:MITStargazers:3871Issues:46Issues:149

pytorch-fid

Compute FID scores with PyTorch.

Language:PythonLicense:Apache-2.0Stargazers:3320Issues:15Issues:86

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonLicense:MITStargazers:2030Issues:31Issues:84

rhubarb-lip-sync

Rhubarb Lip Sync is a command-line tool that automatically creates 2D mouth animation from voice recordings. You can use it for characters in computer games, in animated cartoons, or in any other project that requires animating mouths based on existing recordings.

Language:C++License:NOASSERTIONStargazers:1796Issues:54Issues:123

unidiffuser

Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"

Language:PythonLicense:AGPL-3.0Stargazers:1355Issues:17Issues:32

U-ViT

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Language:Jupyter NotebookLicense:MITStargazers:891Issues:12Issues:28

LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Language:PythonLicense:Apache-2.0Stargazers:691Issues:14Issues:103

fairseq2

FAIR Sequence Modeling Toolkit 2

Language:PythonLicense:MITStargazers:676Issues:17Issues:100

seqGAN

A simplified PyTorch implementation of "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." (Yu, Lantao, et al.)

2048-ai

An simple AI for the 2048 game.

Language:GoLicense:MITStargazers:318Issues:13Issues:5

2048-python

🐍 2048

Language:PythonLicense:MITStargazers:318Issues:12Issues:4

CharsiuG2P

Multilingual G2P in 100 languages

Language:Jupyter NotebookLicense:MITStargazers:276Issues:10Issues:12

charsiu

Charsiu: A neural phonetic aligner.

Language:Jupyter NotebookLicense:MITStargazers:267Issues:8Issues:17

auto_avsr

Auto-AVSR: Lip-Reading Sentences Project

Language:PythonLicense:Apache-2.0Stargazers:164Issues:5Issues:35

PromptingWhisper

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation

LoadLoraWithTags

Save/Load trigger words for loras from a json and auto fetch them on civitai if they are missing. Optional prompt input to auto append them (togglable). Actual alphabetical order and print trigger words to terminal. Also bypass toggle to disable without aiming the sliders at 0.

Visual-Audio-Memory

PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)

Language:PythonLicense:NOASSERTIONStargazers:19Issues:1Issues:5

AV4SER

PyTorch implementation for Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition

Visual-Audio-Memory

PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)

Language:PythonLicense:NOASSERTIONStargazers:2Issues:0Issues:0