No One (josephzpng)

josephzpng

Geek Repo

Github PK Tool:Github PK Tool

No One's starred repositories

segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:7781Issues:0Issues:0

Mask2Former

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Language:PythonLicense:MITStargazers:2390Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:18495Issues:0Issues:0

LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Language:PythonLicense:Apache-2.0Stargazers:1683Issues:0Issues:0

VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Language:PythonLicense:NOASSERTIONStargazers:184Issues:0Issues:0

PSALM

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"

Language:PythonLicense:Apache-2.0Stargazers:161Issues:0Issues:0

UniMD

UniMD: Towards Unifying Moment retrieval and temporal action Detection

Language:PythonStargazers:20Issues:0Issues:0

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonLicense:GPL-3.0Stargazers:4087Issues:0Issues:0

Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

Stargazers:1053Issues:0Issues:0

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonLicense:Apache-2.0Stargazers:1698Issues:0Issues:0

ActivityNet-Entities

A Dataset for Grounded Video Description

Language:PythonLicense:NOASSERTIONStargazers:158Issues:0Issues:0

swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonLicense:Apache-2.0Stargazers:2690Issues:0Issues:0

TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Language:PythonLicense:BSD-3-ClauseStargazers:250Issues:0Issues:0

DemystifyLocalViT

Official code for paper "On the Connection between Local Attention and Dynamic Depth-wise Convolution" ICLR 2022 Spotlight

Language:Jupyter NotebookLicense:MITStargazers:181Issues:0Issues:0

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:25130Issues:0Issues:0

CnOCR

CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. 【基于 PyTorch/MXNet 的中文/英文 OCR Python 包。】

Language:PythonLicense:Apache-2.0Stargazers:3095Issues:0Issues:0

Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

Language:PythonLicense:MITStargazers:23765Issues:0Issues:0

CoDeF

[CVPR 2024 Highlight] Official PyTorch implementation of CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Language:PythonLicense:NOASSERTIONStargazers:4805Issues:0Issues:0

VimTS

VimTS: A Unified Video and Image Text Spotter

Language:PythonLicense:GPL-3.0Stargazers:67Issues:0Issues:0

ControlNet

Let us control diffusion models!

Language:PythonLicense:Apache-2.0Stargazers:29280Issues:0Issues:0

UniMatch

[CVPR 2023] Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation

Language:PythonLicense:MITStargazers:436Issues:0Issues:0

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

Language:PythonLicense:GPL-3.0Stargazers:3286Issues:0Issues:0

labelImg

LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.

Language:PythonLicense:MITStargazers:22302Issues:0Issues:0

OpenSeeD

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Language:PythonLicense:Apache-2.0Stargazers:622Issues:0Issues:0

cityscapes-to-coco-conversion

Cityscapes to CoCo Format Conversion Tool for Mask-RCNN and Detectron

Language:PythonStargazers:78Issues:0Issues:0

llm-action

本项目旨在分享大模型相关技术原理以及实战经验。

Language:HTMLLicense:Apache-2.0Stargazers:8282Issues:0Issues:0

GLIP

Grounded Language-Image Pre-training

Language:PythonLicense:MITStargazers:2103Issues:0Issues:0

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonLicense:MITStargazers:1605Issues:0Issues:0

OccNet-Course

国内首个占据栅格网络全栈课程《从BEV到Occupancy Network,算法原理与工程实践》,包含端侧部署。Surrounding Semantic Occupancy Perception Course for Autonomous Driving (docs, ppt and source code) 在线课程主页:http://111.229.117.200:8100/ (作者独立搭建)

Language:PythonLicense:BSD-3-ClauseStargazers:316Issues:0Issues:0

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:2055Issues:0Issues:0