Beast code in Giters

swtju14's starred repositories

Vary

[ECCV2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

Language:Python165100

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.012895100

DreamLLM

[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation

Language:PythonApache-2.035400

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION430600

webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Language:PythonBSD-3-Clause209500

wit

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

NOASSERTION97400

Awesome-Multimodality

A Survey on multimodal learning research.

28900

Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language:PythonMIT396800

GPT4Tools

GPT4Tools is an intelligent system that can automatically decide, control, and utilize different visual foundation models, allowing the user to interact with images during a conversation.

Language:PythonNOASSERTION74300

AGIEval

Language:PythonMIT66800

MOSS

An open-source tool-augmented conversational language model from Fudan University

Language:PythonApache-2.01188100

LLMZoo

⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡

Language:PythonApache-2.0290500

mmc4

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Language:PythonMIT88400

mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

Language:PythonApache-2.0328400

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.04550400

VoxelNeXt

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking (CVPR 2023)

Language:PythonApache-2.068000

TaskMatrix

Language:PythonNOASSERTION3453700

Birds-eye-view-Perception

[IEEE T-PAMI] Awesome BEV perception research and cookbook for all level audience in autonomous diriving

Language:PythonApache-2.0112400

FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

Language:PythonApache-2.0907400

CMT

[ICCV 2023] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

Language:PythonNOASSERTION31100

RevCol

Official Code of Paper "Reversible Column Networks" "RevColv2"

Language:PythonApache-2.024500

RAM-multiprocess-dataloader

Demystify RAM Usage in Multi-Process Data Loaders

Language:PythonApache-2.016800

BEVStereo

Official code for BEVStereo

Language:PythonMIT25100

EVA

EVA Series: Visual Representation Fantasies from BAAI

Language:PythonMIT209900

DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Language:PythonApache-2.0208000

detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.

Language:PythonApache-2.0190900

BEVDepth

Official code for BEVDepth.

Language:PythonMIT68400

vidt

Language:PythonApache-2.030400

MIMDet

[ICCV 2023] You Only Look at One Partial Sequence

Language:PythonMIT33100

YOLOX

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

Language:PythonApache-2.0917000