caisheng (1079863482)

1079863482

Geek Repo

Github PK Tool:Github PK Tool

caisheng's starred repositories

surya

OCR, layout analysis, reading order, line detection in 90+ languages

Language:PythonLicense:GPL-3.0Stargazers:9551Issues:78Issues:117

nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Language:PythonLicense:MITStargazers:8620Issues:64Issues:203

GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonLicense:Apache-2.0Stargazers:6001Issues:37Issues:292

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:4563Issues:50Issues:411

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonLicense:GPL-3.0Stargazers:4155Issues:38Issues:412

chinese-chatbot-corpus

中文公开聊天语料库

Language:PythonLicense:Apache-2.0Stargazers:3958Issues:75Issues:18

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

Language:PythonLicense:GPL-3.0Stargazers:3393Issues:29Issues:553

FastDeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.

Language:C++License:Apache-2.0Stargazers:2876Issues:57Issues:1152

baby-llama2-chinese

用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.

Language:PythonLicense:MITStargazers:2406Issues:17Issues:72

RT-DETR

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥

Language:PythonLicense:Apache-2.0Stargazers:2093Issues:24Issues:391

SOLIDER

A Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images, which can benefit downstream human-centric tasks to the maximum extent

Language:PythonLicense:Apache-2.0Stargazers:1892Issues:130Issues:29

ml-fastvit

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023

Language:PythonLicense:NOASSERTIONStargazers:1791Issues:30Issues:0

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonLicense:MITStargazers:1653Issues:21Issues:111

DeepStream-Yolo

NVIDIA DeepStream SDK 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models

Language:C++License:MITStargazers:1414Issues:30Issues:539

Co-DETR

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training

Language:PythonLicense:MITStargazers:937Issues:10Issues:156

ovsam

[ECCV 2024] The official code of paper "Open-Vocabulary SAM".

Language:PythonLicense:NOASSERTIONStargazers:878Issues:14Issues:35

rknn-cpp-Multithreading

A simple demo of yolov5s running on rk3588/3588s using c++ (about 142 frames). / 一个使用c++在rk3588/3588s上运行的yolov5s简单demo(142帧/s)。

Language:CLicense:Apache-2.0Stargazers:404Issues:5Issues:48

CLIP-ReID

Official implementation for "CLIP-ReID: Exploiting Vision-Language Model for Image Re-identification without Concrete Text Labels" (AAAI 2023)

Language:PythonLicense:MITStargazers:243Issues:4Issues:45

PlateRecognition

License-Plate-Recognition 支持12种车牌检测识别,包含yolov5,yolov7,yolov8车牌检测,车牌矫正,车牌识别等,准确率高达99.5% 还有车牌数据集提供下载

Language:C++License:MITStargazers:206Issues:2Issues:1

mindocr

A toolbox of ocr models and algorithms based on MindSpore

Language:PythonLicense:Apache-2.0Stargazers:187Issues:13Issues:96

HybridSORT

[AAAI2024]Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking

Language:PythonLicense:MITStargazers:158Issues:5Issues:33

rk3588-yolo-demo

The project is a multi-threaded inference demo of Yolo running on the RK3588 platform, which has been adapted for reading video files and camera feeds. The demo uses the Yolov8n model for file inference, with a maximum inference frame rate of up to 100 frames per second.

Language:C++License:MITStargazers:150Issues:2Issues:17

BXC_VideoAnalyzer_v4

C++开发的视频行为分析系统v4版本

Language:C++License:MITStargazers:127Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:62Issues:4Issues:6

TF-CLIP

TF-CLIP: Learning Text-Free CLIP for Video-Based Person Re-identification (AAAI2024)

Language:PythonLicense:MITStargazers:35Issues:2Issues:4

PaddleOCR2RKNN

Convert the official paddleocr model to a deployable model on RK1126

Language:C++Stargazers:31Issues:0Issues:0

EdgeAI-Engine

🔥🔥机器视觉边缘计算的成熟应用,适配RK瑞芯微/Ascend昇腾系列芯片,提供模型训练、模型量化源代码🔥🔥

Language:PythonLicense:Apache-2.0Stargazers:19Issues:0Issues:0

PCL-CLIP

Code for "Prototypical Contrastive Learning-based CLIP Fine-tuning for Object Re-identification".