dltlqqns's starred repositories

Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Language:PythonLicense:Apache-2.0Stargazers:6146Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:7271Issues:0Issues:0
Language:PythonStargazers:754Issues:0Issues:0

datacomp

DataComp: In search of the next generation of multimodal datasets

Language:PythonLicense:NOASSERTIONStargazers:572Issues:0Issues:0

Vista

A Generalizable World Model for Autonomous Driving

Language:PythonLicense:Apache-2.0Stargazers:328Issues:0Issues:0

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

Language:PythonLicense:Apache-2.0Stargazers:560Issues:0Issues:0

SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward

Language:PythonStargazers:434Issues:0Issues:0

BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:4397Issues:0Issues:0

HA-DPO

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Language:PythonLicense:Apache-2.0Stargazers:40Issues:0Issues:0
Language:PythonStargazers:63Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:20830Issues:0Issues:0

HPSv2

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:303Issues:0Issues:0
Language:PythonLicense:MITStargazers:2445Issues:0Issues:0

VILA

VILA - A multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonLicense:Apache-2.0Stargazers:164Issues:0Issues:0

GPT4RoI

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Language:PythonLicense:NOASSERTIONStargazers:472Issues:0Issues:0

r2c

Recognition to Cognition Networks (code for the model in "From Recognition to Cognition: Visual Commonsense Reasoning", CVPR 2019)

Language:PythonLicense:MITStargazers:464Issues:0Issues:0

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonLicense:MITStargazers:33881Issues:0Issues:0

awesome-action-recognition

A curated list of action recognition and related area resources

Stargazers:3731Issues:0Issues:0
Language:ShellStargazers:705Issues:0Issues:0

GeoChat

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

Language:PythonStargazers:326Issues:0Issues:0

InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

Language:PythonStargazers:1861Issues:0Issues:0

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:3548Issues:0Issues:0

j2735decoder

Python library to decode J2735 encoded UPER hex (Currently supporting BSM, MAP, SPaT).

Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0

Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3420Issues:0Issues:0

Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

Stargazers:834Issues:0Issues:0

mistral

Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.

Language:PythonLicense:Apache-2.0Stargazers:545Issues:0Issues:0

EvalAI-Starters

How to create a challenge on EvalAI?

Language:PythonStargazers:71Issues:0Issues:0

Fooocus

Focus on prompting and generating

Language:PythonLicense:GPL-3.0Stargazers:37375Issues:0Issues:0

mmc4

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Language:PythonLicense:MITStargazers:877Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:7179Issues:0Issues:0