tianyu-z

Tianyu Zhang's repositories

VCR

Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

Language:PythonCC-BY-SA-4.01600

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

Language:PythonApache-2.0000

A lightweight real-time tracker of user interactions for WPF. Support both mouse and keyboard actions. Able to save the tracked recording to a string value and play the recorded actions for UI/UX analysis. Support full window monitoring or a specified focus on a particular element. Support saving the initial size and other states upon starting.

MIT000

DLS

decentralized learning scheduler

Language:Python000

Nordhaus-OPEN

A github copy of https://yale.app.box.com/s/whlqcr7gtzdm4nxnrfhvap2hlzebuvvm from https://williamnordhaus.com/

Language:GAMS000

decentralized

Language:Python000

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonApache-2.0000

VCR-wiki-en-easy-test-500

Raw data for VCR-wiki-en-easy-test-500 from https://huggingface.co/datasets/vcr-org/VCR-wiki-en-easy-test-500

CC-BY-SA-4.0100

VCR-wiki-zh-easy-test-500

Raw data for VCR-wiki-zh-easy-test-100 from https://huggingface.co/datasets/vcr-org/VCR-wiki-zh-easy-test-100

CC-BY-SA-4.0100

VCR-wiki-zh-hard-test-500

Raw data for VCR-wiki-zh-hard-test-500 from https://huggingface.co/datasets/vcr-org/VCR-wiki-zh-hard-test-500

CC-BY-SA-4.0100

VCR-wiki-en-hard-test-500

Raw data for VCR-wiki-en-hard-test-500 from https://huggingface.co/datasets/vcr-org/VCR-wiki-en-hard-test-500

CC-BY-SA-4.0000

lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Language:PythonNOASSERTION000

EfficientZeroV2

[ICML 2024, Spotlight] EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

GPL-3.0000

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Apache-2.0000

surya

OCR, layout analysis, reading order, line detection in 90+ languages

GPL-3.0000

pykan

Kolmogorov Arnold Networks

MIT000

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

MIT000

tianyu-z

Tianyu Zhang's repositories

VCR

VLMEvalKit

cloudimage

UserActivityTracker

DLS

Nordhaus-OPEN

decentralized

CogVLM2

VCR-wiki-en-easy-test-500

VCR-wiki-zh-easy-test-500

VCR-wiki-zh-hard-test-500

VCR-wiki-en-hard-test-500

lmms-eval

EfficientZeroV2

mPLUG-DocOwl

surya

pykan

VAR

Grounded-Segment-Anything

alpha-zero-general

pymdp

AlphaCLIP

dreamerv3

light_on_chatgpt

MergeLM

Neural-Network-Architecture-Diagrams

Qwen-Audio

gfn-lm-tuning

maze-transformer

whisper