George's repositories
CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
conditional-flow-matching
TorchCFM: a Conditional Flow Matching library
datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
DiffiT
Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
Discffusion
Official repo for the paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
kosmos-2.5-gradio
Script to easy (from the bbox inference and deployment) of kosmos-2.5
LFM
Official PyTorch implementation of the paper: Flow Matching in Latent Space
llama2d
2D Positional Embeddings for Webpage Structural Understanding 🦙👀
LocalAI
:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs ggml, gguf, GPTQ, onnx, TF compatible models: llama, llama2, rwkv, whisper, vicuna, koala, cerebras, falcon, dolly, starcoder, and many others
mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
mmbench-ru-eval
Repository to simple evaluation your results on MMBench-DEV-RU
MoneyPrinterTurbo
Generate short videos with one click using AI LLM.
moondream
tiny vision language model
mpa-archive
Crawls a Multi-Page Application to a zip file, serve the Multi-Page Application from the zip file. A MPA archiver. Could be used as a Site Generator
mwmbl
An open source, non-profit search engine implemented in python
qiskit
Qiskit is an open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and primitives.
RL4VLM
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
SeeAct
SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
SeeClick
The model, data and code for the visual GUI Agent SeeClick
self-operating-computer
A framework to enable multimodal models to operate a computer.
stable-diffusion
A latent text-to-image diffusion model
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
vimGPT
Browse the web with GPT-4V and Vimium
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks