yumianhuli's repositories
FollowYourClick
[arXiv 2024] Follow-Your-Click: This repo is the official implementation of "Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts"
FollowYourPose
[AAAI 2024] Follow-Your-Pose: This repo is the official implementation of "Follow-Your-Pose : Pose-Guided Text-to-Video Generation using Pose-Free Videos"
Fay
Fay is an open-source digital human framework integrating language models and digital characters. It offers retail, assistant, and agent versions for diverse applications like virtual shopping guides, broadcasters, assistants, waiters, teachers, and voice or text-based mobile assistants.
Open-Sora-Plan
This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
UniEdit
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
momask-codes
Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"
RapidVideOCR
Extract video hard subtitles and automatically generate corresponding srt files.
VAST
Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Latte
Latte: Latent Diffusion Transformer for Video Generation.
Awesome-Interaction-Aware-Trajectory-Prediction
A selection of state-of-the-art research materials on trajectory prediction
MoneyPrinterV2
Automate the process of making money online.
adetailer
Auto detecting, masking and inpainting with detection model.
crewAI
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
HybrIK
Official code of "HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation", CVPR 2021
Sakura-13B-Galgame
适配轻小说/Galgame的日中翻译大模型
ChatDev-QingHua
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit
sd-webui-negpip
Extension for Stable Diffusion web-ui enables negative prompt in prompt
smanga
A simple manga browser 一款docker直装的漫画浏览器
MaterialSearch
AI语义搜索本地素材。以图搜图、查找本地素材、根据文字描述匹配画面、视频帧搜索、根据画面描述搜索视频。Semantic search. Search local photos and videos through natural language.
Qwen1.5
Qwen1.5 is the improved version of Qwen, the large language model series developed by Qwen team, Alibaba Cloud.
tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
YOLO-World
Real-Time Open-Vocabulary Object Detection
LLMUnity
Integrate LLM models in Unity!
InternVideo
InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)