JingfanChen's starred repositories

Language:PythonStargazers:6Issues:0Issues:0

RAT

Implementation of "RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation".

Language:PythonStargazers:140Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:27Issues:0Issues:0

search-agents

Code for the paper 🌳 Tree Search for Language Model Agents

Language:PythonLicense:MITStargazers:108Issues:0Issues:0

MCTS-DPO

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Language:PythonLicense:Apache-2.0Stargazers:40Issues:0Issues:0

digirl

Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.

Language:PythonLicense:Apache-2.0Stargazers:172Issues:0Issues:0

IPR

Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement

Language:PythonStargazers:11Issues:0Issues:0

Agent-Eval-Refine

Code for Paper: Autonomous Evaluation and Refinement of Digital Agents

Language:PythonLicense:BSD-3-ClauseStargazers:76Issues:0Issues:0

ETO

Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)

Language:PythonStargazers:76Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:152Issues:0Issues:0

TravelPlanner

[ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"

Language:PythonLicense:MITStargazers:191Issues:0Issues:0

skyvern

Automate browser-based workflows with LLMs and Computer Vision

Language:PythonLicense:AGPL-3.0Stargazers:5530Issues:0Issues:0

nxtp

Object Recognition as Next Token Prediction (CVPR 2024)

Language:PythonLicense:NOASSERTIONStargazers:145Issues:0Issues:0

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:PythonStargazers:700Issues:0Issues:0

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:4455Issues:0Issues:0

honeybee

Official implementation of project Honeybee (CVPR 2024)

Language:PythonLicense:NOASSERTIONStargazers:400Issues:0Issues:0

SoM-LLaVA

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Language:PythonStargazers:105Issues:0Issues:0

vlm-evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning

Language:PythonLicense:NOASSERTIONStargazers:68Issues:0Issues:0

cider

Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming effects. We also add the possibility to replace the original PTBTokenizer with the Spacy tekenizer (No java dependincy but slower)

Language:PythonLicense:NOASSERTIONStargazers:7Issues:0Issues:0

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonLicense:Apache-2.0Stargazers:1072Issues:0Issues:0

Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Language:PythonLicense:Apache-2.0Stargazers:732Issues:0Issues:0

NExT-Chat

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Language:PythonLicense:Apache-2.0Stargazers:191Issues:0Issues:0

bubogpt

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Language:PythonLicense:BSD-3-ClauseStargazers:490Issues:0Issues:0

prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Language:PythonLicense:MITStargazers:367Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:307Issues:0Issues:0

SeeClick

The model, data and code for the visual GUI Agent SeeClick

Language:HTMLLicense:Apache-2.0Stargazers:150Issues:0Issues:0

VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

Language:PythonStargazers:38Issues:0Issues:0

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Language:PythonLicense:Apache-2.0Stargazers:496Issues:0Issues:0

screen_annotation

The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, location, OCR text and a short description. It has been introduced in the paper `ScreenAI: A Vision-Language Model for UI and Infographics Understanding`.

Stargazers:38Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:8232Issues:0Issues:0