George's starred repositories

rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:20846Issues:180Issues:403

llamafile

Distribute and run LLMs with a single file.

Language:C++License:NOASSERTIONStargazers:17643Issues:159Issues:367

jsonhero-web

JSON Hero is an open-source, beautiful JSON explorer for the web that lets you browse, search and navigate your JSON files at speed. 🚀. Built with 💜 by the Trigger.dev team.

Language:TypeScriptLicense:Apache-2.0Stargazers:9037Issues:47Issues:109

corenet

CoreNet: A library for training deep neural networks

Language:PythonLicense:NOASSERTIONStargazers:6812Issues:63Issues:19

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilogStargazers:6730Issues:65Issues:22

libreddit

Private front-end for Reddit

Language:RustLicense:AGPL-3.0Stargazers:5001Issues:41Issues:581

agents

An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents

Language:PythonLicense:Apache-2.0Stargazers:4955Issues:58Issues:71

01

The open-source language model computer

Language:PythonLicense:AGPL-3.0Stargazers:4779Issues:83Issues:108

Windrecorder

Windrecorder is a memory search app by records everything on your screen in small size, to let you rewind what you have seen, query through OCR text or image description, and get activity statistics.

Language:PythonLicense:GPL-2.0Stargazers:2739Issues:18Issues:101

InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Language:PythonLicense:MITStargazers:2423Issues:35Issues:259

twinny

The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code - like GitHub Copilot but completely free and 100% private.

Language:TypeScriptLicense:MITStargazers:2279Issues:14Issues:154

webllama

Llama-3 agents that can browse the web by following instructions and talking to you

Language:PythonLicense:MITStargazers:1280Issues:21Issues:8

pywinassistant

The first open source Large Action Model generalist Artificial Narrow Intelligence that controls completely human user interfaces by only using natural language. PyWinAssistant utilizes Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models.

Language:PythonLicense:MITStargazers:1248Issues:31Issues:16

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonLicense:Apache-2.0Stargazers:1148Issues:29Issues:134

llm-datasets

High-quality datasets, tools, and concepts for LLM fine-tuning.

databonsai

clean & curate your data with LLMs.

Language:PythonLicense:MITStargazers:443Issues:2Issues:2

all-seeing

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"

mlc-MiniCPM

MiniCPM on Android platform.

Language:PythonLicense:Apache-2.0Stargazers:419Issues:6Issues:0

Gentopia

Build Hierarchical Autonomous Agents through Config. Collaborative Growth of Specialized Agents.

Language:PythonLicense:MITStargazers:286Issues:2Issues:5

RetrivalLMPapers

Paper collections of retrieval-based (augmented) language model.

BrowserGym

BrowserGym, a gym environment for web task automation in the Chromium browser.

Language:PythonLicense:NOASSERTIONStargazers:220Issues:6Issues:17

ChartVLM

Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Language:PythonLicense:CC-BY-4.0Stargazers:192Issues:12Issues:13

visualwebarena

VisualWebArena is a benchmark for multimodal agents.

Language:PythonLicense:MITStargazers:192Issues:5Issues:39

multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Language:PythonLicense:Apache-2.0Stargazers:161Issues:3Issues:20

txtdot

An HTTP proxy that parses only text, links and pictures from pages reducing internet bandwidth usage, removing ads and heavy scripts

Language:TypeScriptLicense:MITStargazers:149Issues:3Issues:28

SoM-LLaVA

[COLM-2024] List Items One by One: Empowering Multimodal LLMs with Set-of-Mark Prompting and Improved Visual Reasoning Ability.

screen_qa

ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K screenshots from Rico. It should be used to train and evaluate models capable of screen content understanding via question answering.

ESTextSpotter

(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"