WhiteFu

followers

following

stars

WhiteFu's repositories

ai-audio-startups

Community list of startups working with AI in audio and music technology

Apache-2.0000

audio-pipeline

Language:PythonApache-2.0000

AudioEditingCode

Language:PythonCC-BY-SA-4.0000

awesome-audio-plaza

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

MIT000

Awesome-LLMs-Datasets

Summarize existing representative LLMs text datasets.

Apache-2.0000

Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

000

Bunny

A family of lightweight multimodal models.

Language:PythonApache-2.0000

ConsistI2V

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Language:PythonMIT000

diarizers

Language:Python000

EVA

EVA Series: Visual Representation Fantasies from BAAI

Language:PythonMIT000

FRESCO

[CVPR 2024] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

Language:Jupyter NotebookNOASSERTION000

i-Code

Language:Jupyter NotebookMIT000

languagecodec

Official code repository of Language-Codec

Language:PythonMIT000

lina-speech

lina-speech : linear attention based text-to-speech

Language:Jupyter NotebookNOASSERTION000

llava-phi

Language:Python000

M2UGen

This is the official repository for M2UGen

Language:Jupyter NotebookMIT000

Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

Language:PythonApache-2.0000

metavoice-src

Foundational model for human-like, expressive TTS

Language:PythonApache-2.0000

MoneyPrinterTurbo

利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.

Language:PythonMIT000

Open-Sora

Building your own video generation model like OpenAI's Sora

Language:PythonApache-2.0000

OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

Apache-2.0000

pyannote-whisper

Language:Python000

pytorch-speech-features

NOASSERTION000

pyvideotrans

Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言，并添加配音

Language:PythonGPL-3.0000

snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Language:PythonMIT000

SoraReview

The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".

000

transcribe-anything

Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯

Language:PythonMIT000

tts-qa

Language:Python000

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:PythonNOASSERTION000

youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

Language:PythonMIT000