Beast code in Giters

No One's starred repositories

segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.0778100

Mask2Former

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Language:PythonMIT239000

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01849500

LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Language:PythonApache-2.0168300

VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Language:PythonNOASSERTION18400

PSALM

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"

Language:PythonApache-2.016100

UniMD

UniMD: Towards Unifying Moment retrieval and temporal action Detection

Language:Python2000

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonGPL-3.0408700

Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

105300

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonApache-2.0169800

ActivityNet-Entities

A Dataset for Grounded Video Description

Language:PythonNOASSERTION15800

swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonApache-2.0269000

TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Language:PythonBSD-3-Clause25000

DemystifyLocalViT

Official code for paper "On the Connection between Local Attention and Dynamic Depth-wise Convolution" ICLR 2022 Spotlight

Language:Jupyter NotebookMIT18100

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2513000

CnOCR

CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. 【基于 PyTorch/MXNet 的中文/英文 OCR Python 包。】

Language:PythonApache-2.0309500

Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。

Language:PythonMIT2376500

CoDeF

[CVPR 2024 Highlight] Official PyTorch implementation of CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Language:PythonNOASSERTION480500

VimTS

VimTS: A Unified Video and Image Text Spotter

Language:PythonGPL-3.06700

ControlNet

Let us control diffusion models!

Language:PythonApache-2.02928000

UniMatch

[CVPR 2023] Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation

Language:PythonMIT43600

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

Language:PythonGPL-3.0328600

labelImg

LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.

Language:PythonMIT2230200

josephzpng