Tianyu Zhang's repositories
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
cloudimage
Personal
UserActivityTracker
A lightweight real-time tracker of user interactions for WPF. Support both mouse and keyboard actions. Able to save the tracked recording to a string value and play the recorded actions for UI/UX analysis. Support full window monitoring or a specified focus on a particular element. Support saving the initial size and other states upon starting.
DLS
decentralized learning scheduler
Nordhaus-OPEN
A github copy of https://yale.app.box.com/s/whlqcr7gtzdm4nxnrfhvap2hlzebuvvm from https://williamnordhaus.com/
CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
VCR-wiki-en-easy-test-500
Raw data for VCR-wiki-en-easy-test-500 from https://huggingface.co/datasets/vcr-org/VCR-wiki-en-easy-test-500
VCR-wiki-zh-easy-test-500
Raw data for VCR-wiki-zh-easy-test-100 from https://huggingface.co/datasets/vcr-org/VCR-wiki-zh-easy-test-100
VCR-wiki-zh-hard-test-500
Raw data for VCR-wiki-zh-hard-test-500 from https://huggingface.co/datasets/vcr-org/VCR-wiki-zh-hard-test-500
VCR-wiki-en-hard-test-500
Raw data for VCR-wiki-en-hard-test-500 from https://huggingface.co/datasets/vcr-org/VCR-wiki-en-hard-test-500
lmms-eval
Accelerating the development of large multimodal models (LMMs) with lmms-eval
EfficientZeroV2
[ICML 2024, Spotlight] EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
surya
OCR, layout analysis, reading order, line detection in 90+ languages
pykan
Kolmogorov Arnold Networks
VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Grounded-Segment-Anything
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
alpha-zero-general
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
pymdp
A Python implementation of active inference for Markov Decision Processes
AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
dreamerv3
Mastering Diverse Domains through World Models
light_on_chatgpt
Good for e-ink monitor user to use ChatGPT. It makes the code blocks white and makes the UI wider.
MergeLM
Codebase for Merging Language Models
Neural-Network-Architecture-Diagrams
Diagrams for visualizing neural network architecture (Created with diagrams.net)
Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
maze-transformer
This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.
whisper
Robust Speech Recognition via Large-Scale Weak Supervision