Natyren

George's starred repositories

rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files

Language:GoMIT45273 580 5378

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.020846 180 403

llamafile

Distribute and run LLMs with a single file.

Language:C++NOASSERTION17643 159 367

jsonhero-web

JSON Hero is an open-source, beautiful JSON explorer for the web that lets you browse, search and navigate your JSON files at speed. 🚀. Built with 💜 by the Trigger.dev team.

Language:TypeScriptApache-2.09037 47 109

corenet

CoreNet: A library for training deep neural networks

Language:PythonNOASSERTION6812 63 19

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilog6730 65 22

libreddit

Private front-end for Reddit

Language:RustAGPL-3.05001 41 581

agents

An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents

Language:PythonApache-2.04955 58 71

01

The open-source language model computer

Language:PythonAGPL-3.04779 83 108

Windrecorder

Windrecorder is a memory search app by records everything on your screen in small size, to let you rewind what you have seen, query through OCR text or image description, and get activity statistics.

Language:PythonGPL-2.02739 18 101

InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Language:PythonMIT2423 35 259

twinny

The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code - like GitHub Copilot but completely free and 100% private.

Language:TypeScriptMIT2279 14 154

webllama

Llama-3 agents that can browse the web by following instructions and talking to you

Language:PythonMIT1280 21 8

The first open source Large Action Model generalist Artificial Narrow Intelligence that controls completely human user interfaces by only using natural language. PyWinAssistant utilizes Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models.

Language:PythonMIT1248 31 16

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.01148 29 134

llm-datasets

High-quality datasets, tools, and concepts for LLM fine-tuning.

1109 22 1

databonsai

clean & curate your data with LLMs.

Language:PythonMIT443 2 2

all-seeing

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"

Language:Python425 23 18

mlc-MiniCPM

MiniCPM on Android platform.

Language:PythonApache-2.0419 60

Gentopia

Build Hierarchical Autonomous Agents through Config. Collaborative Growth of Specialized Agents.

Language:PythonMIT286 2 5

RetrivalLMPapers

Paper collections of retrieval-based (augmented) language model.

225 50

BrowserGym

BrowserGym, a gym environment for web task automation in the Chromium browser.

Language:PythonNOASSERTION220 6 17

ChartVLM

Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Language:PythonCC-BY-4.0192 12 13

visualwebarena

VisualWebArena is a benchmark for multimodal agents.

Language:PythonMIT192 5 39

multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Language:PythonApache-2.0161 3 20

txtdot

An HTTP proxy that parses only text, links and pictures from pages reducing internet bandwidth usage, removing ads and heavy scripts

Language:TypeScriptMIT149 3 28

SoM-LLaVA

[COLM-2024] List Items One by One: Empowering Multimodal LLMs with Set-of-Mark Prompting and Improved Visual Reasoning Ability.

Language:Python102 3 3

screen_qa

ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K screenshots from Rico. It should be used to train and evaluate models capable of screen content understanding via question answering.

CC-BY-4.076 6 1

ESTextSpotter

(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

Language:Python71 2 16

VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

Language:Python38 3 3