George's starred repositories
jsonhero-web
JSON Hero is an open-source, beautiful JSON explorer for the web that lets you browse, search and navigate your JSON files at speed. 🚀. Built with 💜 by the Trigger.dev team.
Windrecorder
Windrecorder is a memory search app by records everything on your screen in small size, to let you rewind what you have seen, query through OCR text or image description, and get activity statistics.
InternImage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
pywinassistant
The first open source Large Action Model generalist Artificial Narrow Intelligence that controls completely human user interfaces by only using natural language. PyWinAssistant utilizes Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models.
InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
llm-datasets
High-quality datasets, tools, and concepts for LLM fine-tuning.
databonsai
clean & curate your data with LLMs.
all-seeing
[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
mlc-MiniCPM
MiniCPM on Android platform.
RetrivalLMPapers
Paper collections of retrieval-based (augmented) language model.
BrowserGym
BrowserGym, a gym environment for web task automation in the Chromium browser.
visualwebarena
VisualWebArena is a benchmark for multimodal agents.
multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
screen_qa
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K screenshots from Rico. It should be used to train and evaluate models capable of screen content understanding via question answering.
ESTextSpotter
(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"