George's repositories

Auto-GUI

Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

cardie

An open source business card designer and sharing platform

License:GPL-3.0Stargazers:0Issues:0Issues:0

conditional-flow-matching

TorchCFM: a Conditional Flow Matching library

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

digirl

Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.

Language:PythonStargazers:0Issues:0Issues:0

Discffusion

Official repo for the paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"

License:MITStargazers:0Issues:0Issues:0

goodcatch

Open-source attempt to implement tiny vision-language model which works well with text-rich images

Stargazers:0Issues:1Issues:0

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Stargazers:0Issues:0Issues:0

kosmos-2.5-gradio

Script to easy (from the bbox inference and deployment) of kosmos-2.5

License:Apache-2.0Stargazers:0Issues:0Issues:0

lerobot

🤗 LeRobot: End-to-end Learning for Real-World Robotics in Pytorch

License:Apache-2.0Stargazers:0Issues:0Issues:0

llama2d

2D Positional Embeddings for Webpage Structural Understanding 🦙👀

Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:NOASSERTIONStargazers:0Issues:0Issues:0

mmbench-ru-eval

Repository to simple evaluation your results on MMBench-DEV-RU

Language:PythonStargazers:0Issues:0Issues:0

MoneyPrinterTurbo

Generate short videos with one click using AI LLM.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

moondream

tiny vision language model

Stargazers:0Issues:0Issues:0

mpa-archive

Crawls a Multi-Page Application to a zip file, serve the Multi-Page Application from the zip file. A MPA archiver. Could be used as a Site Generator

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:1Issues:0

Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.

Stargazers:0Issues:0Issues:0

RL4VLM

Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

License:MITStargazers:0Issues:0Issues:0

screenshot-to-code

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

License:MITStargazers:0Issues:0Issues:0

SeeAct

SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

SeeClick

The model, data and code for the visual GUI Agent SeeClick

Language:HTMLStargazers:0Issues:0Issues:0

self-operating-computer

A framework to enable multimodal models to operate a computer.

License:MITStargazers:0Issues:0Issues:0

text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

Language:PythonLicense:AGPL-3.0Stargazers:0Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

trl

Train transformer language models with reinforcement learning.

License:Apache-2.0Stargazers:0Issues:0Issues:0

vimGPT

Browse the web with GPT-4V and Vimium

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

License:GPL-3.0Stargazers:0Issues:0Issues:0