Wenhai Wang (whai362)

whai362

Geek Repo

Company:OpenGVLab@Shanghai AI Laboratory

Location:Shanghai

Home Page:http://whai362.github.io/

Github PK Tool:Github PK Tool

Wenhai Wang's starred repositories

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:24659Issues:208Issues:208

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:4421Issues:43Issues:376

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3882Issues:114Issues:73

kimi-free-api

🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长:长文本解读整理】,支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话,零配置部署,多路token支持,自动清理会话痕迹。

Language:TypeScriptLicense:GPL-3.0Stargazers:3466Issues:30Issues:101

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonLicense:Apache-2.0Stargazers:3451Issues:33Issues:457

chatgpt-prompts-for-academic-writing

This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:PythonLicense:Apache-2.0Stargazers:2294Issues:41Issues:349

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:2028Issues:36Issues:76

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonLicense:MITStargazers:1596Issues:22Issues:98

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonLicense:Apache-2.0Stargazers:1576Issues:21Issues:85

MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"

Language:PythonLicense:Apache-2.0Stargazers:1108Issues:34Issues:106
Language:TypeScriptLicense:NOASSERTIONStargazers:1009Issues:7Issues:0

UniRepLKNet

[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Language:PythonLicense:Apache-2.0Stargazers:875Issues:12Issues:18

SAM-Med2D

Official implementation of SAM-Med2D

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:808Issues:13Issues:63

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks

Language:PythonLicense:Apache-2.0Stargazers:757Issues:11Issues:104

AlphaCLIP

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:598Issues:11Issues:46

APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

Language:PythonLicense:Apache-2.0Stargazers:464Issues:6Issues:53

DCNv4

[CVPR 2024] Deformable Convolution v4

Language:PythonLicense:MITStargazers:429Issues:7Issues:68

MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

DCI-VTON-Virtual-Try-On

[ACM Multimedia 2023] Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow.

Language:PythonLicense:MITStargazers:374Issues:30Issues:43

Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Language:PythonLicense:Apache-2.0Stargazers:303Issues:4Issues:32

Mini-DALLE3

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

Language:Jupyter NotebookLicense:MITStargazers:293Issues:4Issues:20

UniPose

[ECCV 2024] Official implementation of the paper "UniPose : Detecting Any Keypoints"

Language:PythonLicense:NOASSERTIONStargazers:268Issues:18Issues:17

HIMLoco

Learning-based locomotion control from OpenRobotLab, including Hybrid Internal Model & H-Infinity Locomotion Control

Language:PythonLicense:NOASSERTIONStargazers:221Issues:12Issues:9

ControlLLM

ControlLLM: Augment Language Models with Tools by Searching on Graphs

chug

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

Language:PythonLicense:Apache-2.0Stargazers:139Issues:11Issues:3
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:136Issues:4Issues:16

StyleRF

[CVPR 2023] StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields