swtju14's starred repositories

Vary

[ECCV2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

Language:PythonStargazers:1651Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:128951Issues:0Issues:0

DreamLLM

[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation

Language:PythonLicense:Apache-2.0Stargazers:354Issues:0Issues:0

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:4306Issues:0Issues:0

webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Language:PythonLicense:BSD-3-ClauseStargazers:2095Issues:0Issues:0

wit

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

License:NOASSERTIONStargazers:974Issues:0Issues:0

Awesome-Multimodality

A Survey on multimodal learning research.

Stargazers:289Issues:0Issues:0

Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language:PythonLicense:MITStargazers:3968Issues:0Issues:0

GPT4Tools

GPT4Tools is an intelligent system that can automatically decide, control, and utilize different visual foundation models, allowing the user to interact with images during a conversation.

Language:PythonLicense:NOASSERTIONStargazers:743Issues:0Issues:0
Language:PythonLicense:MITStargazers:668Issues:0Issues:0

MOSS

An open-source tool-augmented conversational language model from Fudan University

Language:PythonLicense:Apache-2.0Stargazers:11881Issues:0Issues:0

LLMZoo

⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡

Language:PythonLicense:Apache-2.0Stargazers:2905Issues:0Issues:0

mmc4

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Language:PythonLicense:MITStargazers:884Issues:0Issues:0

mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

Language:PythonLicense:Apache-2.0Stargazers:3284Issues:0Issues:0

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:45504Issues:0Issues:0

VoxelNeXt

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking (CVPR 2023)

Language:PythonLicense:Apache-2.0Stargazers:680Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:34537Issues:0Issues:0

Birds-eye-view-Perception

[IEEE T-PAMI] Awesome BEV perception research and cookbook for all level audience in autonomous diriving

Language:PythonLicense:Apache-2.0Stargazers:1124Issues:0Issues:0

FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

Language:PythonLicense:Apache-2.0Stargazers:9074Issues:0Issues:0

CMT

[ICCV 2023] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

Language:PythonLicense:NOASSERTIONStargazers:311Issues:0Issues:0

RevCol

Official Code of Paper "Reversible Column Networks" "RevColv2"

Language:PythonLicense:Apache-2.0Stargazers:245Issues:0Issues:0

RAM-multiprocess-dataloader

Demystify RAM Usage in Multi-Process Data Loaders

Language:PythonLicense:Apache-2.0Stargazers:168Issues:0Issues:0

BEVStereo

Official code for BEVStereo

Language:PythonLicense:MITStargazers:251Issues:0Issues:0

EVA

EVA Series: Visual Representation Fantasies from BAAI

Language:PythonLicense:MITStargazers:2099Issues:0Issues:0

DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Language:PythonLicense:Apache-2.0Stargazers:2080Issues:0Issues:0

detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.

Language:PythonLicense:Apache-2.0Stargazers:1909Issues:0Issues:0

BEVDepth

Official code for BEVDepth.

Language:PythonLicense:MITStargazers:684Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:304Issues:0Issues:0

MIMDet

[ICCV 2023] You Only Look at One Partial Sequence

Language:PythonLicense:MITStargazers:331Issues:0Issues:0

YOLOX

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

Language:PythonLicense:Apache-2.0Stargazers:9170Issues:0Issues:0