tuofeilunhifi

tuofeilun's starred repositories

ollama

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

Language:GoMIT73643 441 3217

pykan

Kolmogorov Arnold Networks

Language:Jupyter NotebookMIT13270 115 210

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonApache-2.07257 74 227

llama-cpp-python

Python bindings for llama.cpp

Language:PythonMIT6977 67 944

IC-Light

More relighting!

Language:PythonApache-2.03780 40 58

efficient-kan

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

Language:PythonMIT3344 28 33

SupContrast

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

Language:PythonBSD-2-Clause2954 18 132

ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)

Language:PythonMIT2607 42 251

PyContrast

PyTorch implementation of Contrastive Learning methods

Language:Python1919 42 28

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonMIT1806 17 41

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonApache-2.01333 23 90

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Language:PythonApache-2.01065 27 81

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonApache-2.0824 19 62

pytorch-randaugment

Unofficial PyTorch Reimplementation of RandAugment.

Language:PythonMIT621 15 30

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

Language:PythonApache-2.0557 8 72

Grounding-DINO-1.5-API

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Language:PythonApache-2.0519 10 23

Groma

Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonApache-2.0455 36 14

TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

Language:PythonApache-2.0420 11 75

MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Language:Python344 12 21

prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Language:PythonMIT273 10 31

scaling_on_scales

When do we not need larger vision models?

Language:PythonMIT243 4 11

ScreenAI

Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"

Language:PythonMIT220 8 3

OmniFusion

OmniFusion — a multimodal model to communicate using text and images

Language:PythonApache-2.0211 5 3

Retrieval-Augmented-Visual-Question-Answering

This is the official repository for Retrieval Augmented Visual Question Answering

Language:PythonGPL-3.0117 4 38

MMBench

Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"

Apache-2.0108 4 26

DeLVM

Language:Python101 1 9

MoVA

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

87 7 2

Awesome-Vision-Mamba

✨✨Latest Papers on Vision Mamba and Related Areas

67 40

fld

PyTorch code for FLD (Feature Likelihood Divergence), FID, KID, Precision, Recall, etc. using DINOv2, InceptionV3, CLIP, etc.

Language:Python34 1 3

CLIP-KD

[CVPR-2024] Official implementations of CLIP-KD: An Empirical Study of CLIP Model Distillation

Language:Python22 5 7