tuofeilun's starred repositories
llama-cpp-python
Python bindings for llama.cpp
efficient-kan
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
SupContrast
PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)
PyContrast
PyTorch implementation of Contrastive Learning methods
DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
pytorch-randaugment
Unofficial PyTorch Reimplementation of RandAugment.
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
Grounding-DINO-1.5-API
API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
scaling_on_scales
When do we not need larger vision models?
OmniFusion
OmniFusion — a multimodal model to communicate using text and images
Retrieval-Augmented-Visual-Question-Answering
This is the official repository for Retrieval Augmented Visual Question Answering
Awesome-Vision-Mamba
✨✨Latest Papers on Vision Mamba and Related Areas