George's repositories

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

conditional-flow-matching

TorchCFM: a Conditional Flow Matching library

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

License:Apache-2.0Stargazers:0Issues:0Issues:0

DiffiT

Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation

Stargazers:0Issues:0Issues:0

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Discffusion

Official repo for the paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"

License:MITStargazers:0Issues:0Issues:0

goodcatch

Open-source attempt to implement tiny vision-language model which works well with text-rich images

Stargazers:0Issues:1Issues:0
Stargazers:0Issues:1Issues:0

kosmos-2.5-gradio

Script to easy (from the bbox inference and deployment) of kosmos-2.5

License:Apache-2.0Stargazers:0Issues:0Issues:0

LFM

Official PyTorch implementation of the paper: Flow Matching in Latent Space

License:AGPL-3.0Stargazers:0Issues:0Issues:0

llama2d

2D Positional Embeddings for Webpage Structural Understanding 🦙👀

Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0

LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs ggml, gguf, GPTQ, onnx, TF compatible models: llama, llama2, rwkv, whisper, vicuna, koala, cerebras, falcon, dolly, starcoder, and many others

Language:C++License:MITStargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

License:Apache-2.0Stargazers:0Issues:0Issues:0

mmbench-ru-eval

Repository to simple evaluation your results on MMBench-DEV-RU

Language:PythonStargazers:0Issues:0Issues:0

MoneyPrinterTurbo

Generate short videos with one click using AI LLM.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

moondream

tiny vision language model

Stargazers:0Issues:0Issues:0

mpa-archive

Crawls a Multi-Page Application to a zip file, serve the Multi-Page Application from the zip file. A MPA archiver. Could be used as a Site Generator

License:MITStargazers:0Issues:0Issues:0

mwmbl

An open source, non-profit search engine implemented in python

License:AGPL-3.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

qiskit

Qiskit is an open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and primitives.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

RL4VLM

Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

License:MITStargazers:0Issues:0Issues:0

screenshot-to-code

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

License:MITStargazers:0Issues:0Issues:0

SeeAct

SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

SeeClick

The model, data and code for the visual GUI Agent SeeClick

Stargazers:0Issues:0Issues:0

self-operating-computer

A framework to enable multimodal models to operate a computer.

License:MITStargazers:0Issues:0Issues:0

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:0Issues:0

text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

Language:PythonLicense:AGPL-3.0Stargazers:0Issues:0Issues:0

vimGPT

Browse the web with GPT-4V and Vimium

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

License:Apache-2.0Stargazers:0Issues:0Issues:0