isrkhou's starred repositories

MAD

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

Language:PythonLicense:MITStargazers:144Issues:0Issues:0

moment_detr

[NeurIPS 2021] Moment-DETR code and QVHighlights dataset

Language:PythonLicense:MITStargazers:254Issues:0Issues:0

insightface

State-of-the-art 2D and 3D Face Analysis Project

Language:PythonStargazers:22476Issues:0Issues:0

reid-strong-baseline

Bag of Tricks and A Strong Baseline for Deep Person Re-identification

Language:PythonLicense:MITStargazers:2236Issues:0Issues:0

Person_reID_baseline_pytorch

:bouncing_ball_person: Pytorch ReID: A tiny, friendly, strong pytorch implement of person re-id / vehicle re-id baseline. Tutorial 👉https://github.com/layumi/Person_reID_baseline_pytorch/tree/master/tutorial

Language:PythonLicense:MITStargazers:4040Issues:0Issues:0

Prompt-Can-Anything

You can do anything by sota AI with prompt ,auto AI tools , VL larger model fine and project

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:177Issues:0Issues:0

All-in-One-Gait

TrackGait is a sub project of OpenGait. Implemented a gait recognition system.

Language:PythonStargazers:70Issues:0Issues:0

PLIP

The official code of "PLIP: Language-Image Pre-training for Person Representation Learning"

Language:PythonLicense:MITStargazers:91Issues:0Issues:0

UniVTG

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding

Language:PythonLicense:MITStargazers:310Issues:0Issues:0

SeViLA

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

Language:PythonLicense:BSD-3-ClauseStargazers:174Issues:0Issues:0
Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:750Issues:0Issues:0

ImageBind

ImageBind One Embedding Space to Bind Them All

Language:PythonLicense:NOASSERTIONStargazers:8157Issues:0Issues:0

UniHCP

Official PyTorch implementation of UniHCP

Language:PythonLicense:MITStargazers:146Issues:0Issues:0

modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Language:PythonLicense:Apache-2.0Stargazers:6673Issues:0Issues:0

SOLIDER

A Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images, which can benefit downstream human-centric tasks to the maximum extent

Language:PythonLicense:Apache-2.0Stargazers:1892Issues:0Issues:0

PeekingDuck

A modular framework built to simplify Computer Vision inference workloads.

Language:PythonLicense:Apache-2.0Stargazers:161Issues:0Issues:0

DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Language:PythonLicense:Apache-2.0Stargazers:2133Issues:0Issues:0

playground

A central hub for gathering and showcasing amazing projects that extend OpenMMLab with SAM and other exciting features.

Language:PythonLicense:Apache-2.0Stargazers:1079Issues:0Issues:0

scenic

Scenic: A Jax Library for Computer Vision Research and Beyond

Language:PythonLicense:Apache-2.0Stargazers:3202Issues:0Issues:0

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonLicense:CC-BY-4.0Stargazers:1120Issues:0Issues:0

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonLicense:Apache-2.0Stargazers:1241Issues:0Issues:0

towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Language:PythonLicense:Apache-2.0Stargazers:3121Issues:0Issues:0

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonLicense:BSD-3-ClauseStargazers:2662Issues:0Issues:0

OpenSeeFace

Robust realtime face and facial landmark tracking on CPU with Unity integration

Language:PythonLicense:BSD-2-ClauseStargazers:1396Issues:0Issues:0

human

Human: AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition

Language:HTMLLicense:MITStargazers:2227Issues:0Issues:0

GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonLicense:Apache-2.0Stargazers:5992Issues:0Issues:0

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:14528Issues:0Issues:0

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:46238Issues:0Issues:0

deepface

A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

Language:PythonLicense:MITStargazers:11377Issues:0Issues:0

mmdetection

OpenMMLab Detection Toolbox and Benchmark

Language:PythonLicense:Apache-2.0Stargazers:28852Issues:0Issues:0