Fengdalu

Dalu Feng's starred repositories

ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Language:PythonGPL-3.051152 381 3199

whisper.cpp

Port of OpenAI's Whisper model in C/C++

Language:CMIT34533 315 1298

so-vits-svc

SoftVC VITS Singing Voice Conversion

Language:PythonAGPL-3.025389 177 130

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.025222 193 4038

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.021669 182 478

DeOldify

A Deep Learning based project for colorizing and restoring old images (and video!)

Language:PythonMIT17949 440 382

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonApache-2.015897 106 1028

BlackHole

BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.

Language:CGPL-3.014943 125 398

gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

Language:PythonNOASSERTION13713 116 927

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonBSD-3-Clause8317 100 1179

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

7416 326 263

dreamgaussian

[ICLR 2024 Oral] Generative Gaussian Splatting for Efficient 3D Content Creation

Language:PythonMIT3871 46 149

pytorch-fid

Compute FID scores with PyTorch.

Language:PythonApache-2.03320 15 86

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonMIT2030 31 84

Rhubarb Lip Sync is a command-line tool that automatically creates 2D mouth animation from voice recordings. You can use it for characters in computer games, in animated cartoons, or in any other project that requires animating mouths based on existing recordings.

Language:C++NOASSERTION1796 54 123

unidiffuser

Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"

Language:PythonAGPL-3.01355 17 32

U-ViT

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Language:Jupyter NotebookMIT891 12 28

LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Language:PythonApache-2.0691 14 103

fairseq2

FAIR Sequence Modeling Toolkit 2

Language:PythonMIT676 17 100

seqGAN

A simplified PyTorch implementation of "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." (Yu, Lantao, et al.)

Language:Python638 14 24

2048-ai

An simple AI for the 2048 game.

Language:GoMIT318 13 5

2048-python

🐍 2048

Language:PythonMIT318 12 4

CharsiuG2P

Multilingual G2P in 100 languages

Language:Jupyter NotebookMIT276 10 12

charsiu

Charsiu: A neural phonetic aligner.

Language:Jupyter NotebookMIT267 8 17

auto_avsr

Auto-AVSR: Lip-Reading Sentences Project

Language:PythonApache-2.0164 5 35

PromptingWhisper

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation

Language:Python132 4 8

LoadLoraWithTags

Save/Load trigger words for loras from a json and auto fetch them on civitai if they are missing. Optional prompt input to auto append them (togglable). Actual alphabetical order and print trigger words to terminal. Also bypass toggle to disable without aiming the sliders at 0.

Language:Python51 2 5