Beast code in Giters

JingfanChen's starred repositories

RAT

Implementation of "RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation".

Language:Python14000

search-agents

Code for the paper 🌳 Tree Search for Language Model Agents

Language:PythonMIT10800

MCTS-DPO

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Language:PythonApache-2.04000

digirl

Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.

Language:PythonApache-2.017200

IPR

Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement

Language:Python1100

Agent-Eval-Refine

Code for Paper: Autonomous Evaluation and Refinement of Digital Agents

Language:PythonBSD-3-Clause7600

ETO

Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)

Language:Python7600

TravelPlanner

[ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"

Language:PythonMIT19100

skyvern

Automate browser-based workflows with LLMs and Computer Vision

Language:PythonAGPL-3.0553000

nxtp

Object Recognition as Next Token Prediction (CVPR 2024)

Language:PythonNOASSERTION14500

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:Python70000

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION445500

honeybee

Official implementation of project Honeybee (CVPR 2024)

Language:PythonNOASSERTION40000

SoM-LLaVA

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Language:Python10500

vlm-evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning

Language:PythonNOASSERTION6800

Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming effects. We also add the possibility to replace the original PTBTokenizer with the Spacy tekenizer (No java dependincy but slower)

Language:PythonNOASSERTION700

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonApache-2.0107200

Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Language:PythonApache-2.073200

NExT-Chat

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Language:PythonApache-2.019100

bubogpt

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Language:PythonBSD-3-Clause49000

prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Language:PythonMIT36700

LLaVA-Grounding

Language:PythonApache-2.030700

SeeClick

The model, data and code for the visual GUI Agent SeeClick

Language:HTMLApache-2.015000

VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

Language:Python3800

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Language:PythonApache-2.049600

screen_annotation

The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, location, OCR text and a short description. It has been introduced in the paper `ScreenAI: A Vision-Language Model for UI and Infographics Understanding`.

3800

ml-ferret

Language:PythonNOASSERTION823200

cjfcsjt