mayingwuhu

MA YING's starred repositories

LivePortrait

Bring portraits to life!

Language:PythonMIT700500

ComfyUI

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.

Language:PythonGPL-3.04258800

fish-speech

Brand new TTS solution

Language:PythonNOASSERTION582800

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonGPL-3.0399200

Video-LLaVA

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language:PythonApache-2.0271000

Real3DPortrait

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code

Language:PythonMIT79700

LMM_caption

An attempt at dataset labeling with Large Multimodal Models

Language:Jupyter NotebookApache-2.0100

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION436900

label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Language:JavaScriptApache-2.01753000

unmasked_teacher

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Language:PythonMIT26800

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.0113900

Modern_GUI_PyDracula_PySide6_or_PyQt6

Language:PythonMIT221600

PySide6-Code-Tutorial

可能是最好的PySide6中文教程！用代码实例讲解PySide6，附优质Demos、图标库、QSS皮肤、相关文章等分享！

Language:PythonGPL-3.089800

CVinW_Readings

A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''

109000

LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

582500

Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Language:PythonApache-2.0403200

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookBSD-3-Clause925000

BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language:Jupyter NotebookBSD-3-Clause448700

webrtc-stream

Simple python webrtc streaming demo

Language:Python5700

ALPRO

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Language:PythonBSD-3-Clause18500

X-CLIP

An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"

Language:PythonMIT12300

XPretrain

Multi-modality pre-training

Language:PythonNOASSERTION45400

towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Language:PythonApache-2.0308700

ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

Language:ShellApache-2.02453600

yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Language:PythonAGPL-3.04866100

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.012947100

MOSS

An open-source tool-augmented conversational language model from Fudan University

Language:PythonApache-2.01188800

awesome-video-text-retrieval

A curated list of deep learning resources for video-text retrieval.

56500

DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Language:PythonMIT323100

XMem

[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Language:PythonMIT166700