Wenhai Wang (whai362)

whai362

Geek Repo

Company:OpenGVLab@Shanghai AI Laboratory

Location:Shanghai

Home Page:http://whai362.github.io/

Github PK Tool:Github PK Tool

Wenhai Wang's starred repositories

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:18233Issues:178Issues:2497

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3898Issues:54Issues:294

chatgpt-prompts-for-academic-writing

This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.

T-Rex

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:1850Issues:37Issues:49

InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonLicense:Apache-2.0Stargazers:1490Issues:20Issues:83

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonLicense:MITStargazers:1372Issues:20Issues:66

MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"

Language:PythonLicense:Apache-2.0Stargazers:1013Issues:36Issues:99
Language:PythonLicense:Apache-2.0Stargazers:987Issues:12Issues:72
Language:TypeScriptLicense:NOASSERTIONStargazers:965Issues:6Issues:0

InternVL

[CVPR 2024 Oral] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks —— An Open-Source Alternative to ViT-22B

Language:PythonLicense:MITStargazers:805Issues:10Issues:65

UniRepLKNet

[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Language:PythonLicense:Apache-2.0Stargazers:804Issues:12Issues:15

SAM-Med2D

Official implementation of SAM-Med2D

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:741Issues:12Issues:48

AlphaCLIP

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:486Issues:10Issues:35

InterFuser

[CoRL 2022] InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

Language:PythonLicense:Apache-2.0Stargazers:458Issues:11Issues:82

APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

Language:PythonLicense:Apache-2.0Stargazers:414Issues:7Issues:34

all-seeing

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"

crafter

Benchmarking the Spectrum of Agent Capabilities

Language:PythonLicense:MITStargazers:346Issues:8Issues:18

DCNv4

[CVPR 2024] Deformable Convolution v4

Language:PythonLicense:MITStargazers:321Issues:3Issues:41

DCI-VTON-Virtual-Try-On

[ACM Multimedia 2023] Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow.

Language:PythonLicense:MITStargazers:295Issues:31Issues:41

Mini-DALLE3

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

UniPose

Official implementation of the paper "UniPose : Detecting Any Keypoints"

Language:PythonLicense:NOASSERTIONStargazers:229Issues:18Issues:12

Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Language:PythonLicense:Apache-2.0Stargazers:214Issues:3Issues:5

ControlLLM

ControlLLM: Augment Language Models with Tools by Searching on Graphs

HIMLoco

[ICLR 2024] Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response

StyleRF

[CVPR 2023] StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields

AVSegFormer

[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer