yumianhuli's repositories
adetailer
Auto detecting, masking and inpainting with detection model.
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Awesome-Interaction-Aware-Trajectory-Prediction
A selection of state-of-the-art research materials on trajectory prediction
ChatDev-QingHua
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
crewAI
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
Fay
Fay is an open-source digital human framework integrating language models and digital characters. It offers retail, assistant, and agent versions for diverse applications like virtual shopping guides, broadcasters, assistants, waiters, teachers, and voice or text-based mobile assistants.
FollowYourClick
[arXiv 2024] Follow-Your-Click: This repo is the official implementation of "Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts"
FollowYourPose
[AAAI 2024] Follow-Your-Pose: This repo is the official implementation of "Follow-Your-Pose : Pose-Guided Text-to-Video Generation using Pose-Free Videos"
HybrIK
Official code of "HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation", CVPR 2021
InternVideo
InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)
Latte
Latte: Latent Diffusion Transformer for Video Generation.
LLMUnity
Integrate LLM models in Unity!
MaterialSearch
AI语义搜索本地素材。以图搜图、查找本地素材、根据文字描述匹配画面、视频帧搜索、根据画面描述搜索视频。Semantic search. Search local photos and videos through natural language.
momask-codes
Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"
MoneyPrinterV2
Automate the process of making money online.
Open-Sora-Plan
This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.
Qwen1.5
Qwen1.5 is the improved version of Qwen, the large language model series developed by Qwen team, Alibaba Cloud.
RapidVideOCR
Extract video hard subtitles and automatically generate corresponding srt files.
Sakura-13B-Galgame
适配轻小说/Galgame的日中翻译大模型
sd-webui-negpip
Extension for Stable Diffusion web-ui enables negative prompt in prompt
smanga
A simple manga browser 一款docker直装的漫画浏览器
tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
UniEdit
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
VAST
Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit
YOLO-World
Real-Time Open-Vocabulary Object Detection