MinghanLi

followers

following

stars

Hong Kong Polytechnic University

Hong Kong

https://sites.google.com/view/minghanli-homepage/academic

LI Minghan's starred repositories

make-a-video-pytorch

Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch

Language:PythonMIT188900

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION590900

TarViS

Language:PythonMIT4600

vision_transformer

Language:Jupyter NotebookApache-2.0996400

yolact

A simple, fully convolutional model for real-time instance segmentation.

Language:PythonMIT499700

LaGOT

We enrich the LaSOT validation set with annotations of additional object tracks, up to 10 object tracks per video in total. Tracks consist of precise bounding box annotations of moving objects. Annotations are provided at 10 fps. The original LaSOT validation set annotations and video can be downloaded from: https://vision.cs.stonybrook.edu/~lasot/

Language:PythonCC-BY-4.0600

pytracking

Visual tracking library based on PyTorch.

Language:PythonGPL-3.0317800

FDL

[CVPR-2024] Pytorch implementation of "Misalignment-Robust Frequency Distribution Loss for Image Transformation"

Language:Python2800

CCSR

Official codes of CCSR: Improving the Stability of Diffusion Models for Content Consistent Super-Resolution

Language:Python41700

SeeSR

[CVPR2024] SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Language:PythonApache-2.037300

UVO_Challenge

Language:Python9100

gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Language:PythonApache-2.03176200

VIPOSeg-Benchmark

The benchmark for "Video Object Segmentation in Panoptic Wild Scenes".

Language:Python1000

Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language:PythonMIT294700

LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Language:Jupyter NotebookNOASSERTION48100

POPE

The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''

Language:PythonMIT16000

LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Language:Jupyter NotebookNOASSERTION141100

LWM

Language:PythonApache-2.0706300

dataset

The Open Images dataset

Language:PythonApache-2.0424200

magvit

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Language:PythonApache-2.092700

VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Language:Python285500

ControlVideo

[ICLR 2024] Official pytorch implementation of "ControlVideo: Training-free Controllable Text-to-Video Generation"

Language:PythonMIT75200

CogVideo

Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Language:PythonApache-2.0587500

webvid

Large-scale text-video dataset. 10 million captioned short videos.

Language:Python56400

LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language:PythonMIT66400

Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Language:Python23300

unified-io-2

Language:PythonApache-2.055200

gigagan-pytorch

Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research into GANs

Language:PythonMIT178500

Vary

[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

Language:Python169100

dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Language:Jupyter NotebookApache-2.0865600