WhiteFu's repositories
Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
MoneyPrinterTurbo
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
Bunny
A family of lightweight multimodal models.
lina-speech
lina-speech : linear attention based text-to-speech
pyvideotrans
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
Awesome-LLMs-Datasets
Summarize existing representative LLMs text datasets.
FRESCO
[CVPR 2024] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
awesome-audio-plaza
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
ConsistI2V
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
SoraReview
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
Open-Sora
Building your own video generation model like OpenAI's Sora
EVA
EVA Series: Visual Representation Fantasies from BAAI
M2UGen
This is the official repository for M2UGen
ai-audio-startups
Community list of startups working with AI in audio and music technology
snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
metavoice-src
Foundational model for human-like, expressive TTS
languagecodec
Official code repository of Language-Codec
youtube-transcript-api
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
transcribe-anything
Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯
Awesome-Open-AI-Sora
Sora AI Awesome List – Your go-to resource hub for all things Sora AI, OpenAI's groundbreaking model for crafting realistic scenes from text. Explore a curated collection of articles, videos, podcasts, and news about Sora's capabilities, advancements, and more.
LangSegment
It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言(97种语言)混合文本内容自动分词工具。