Tianyu Zhang (tianyu-z)

tianyu-z

Geek Repo

Company:Mila

Location:Montreal

Home Page:ai.t-zhang.com

Github PK Tool:Github PK Tool

Tianyu Zhang's repositories

VCR

Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

Language:PythonLicense:CC-BY-SA-4.0Stargazers:16Issues:0Issues:0

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

cloudimage

Personal

Stargazers:2Issues:0Issues:0

UserActivityTracker

A lightweight real-time tracker of user interactions for WPF. Support both mouse and keyboard actions. Able to save the tracked recording to a string value and play the recorded actions for UI/UX analysis. Support full window monitoring or a specified focus on a particular element. Support saving the initial size and other states upon starting.

License:MITStargazers:0Issues:0Issues:0

DLS

decentralized learning scheduler

Language:PythonStargazers:0Issues:0Issues:0

Nordhaus-OPEN

A github copy of https://yale.app.box.com/s/whlqcr7gtzdm4nxnrfhvap2hlzebuvvm from https://williamnordhaus.com/

Language:GAMSStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

VCR-wiki-en-easy-test-500

Raw data for VCR-wiki-en-easy-test-500 from https://huggingface.co/datasets/vcr-org/VCR-wiki-en-easy-test-500

License:CC-BY-SA-4.0Stargazers:1Issues:0Issues:0

VCR-wiki-zh-easy-test-500

Raw data for VCR-wiki-zh-easy-test-100 from https://huggingface.co/datasets/vcr-org/VCR-wiki-zh-easy-test-100

License:CC-BY-SA-4.0Stargazers:1Issues:0Issues:0

VCR-wiki-zh-hard-test-500

Raw data for VCR-wiki-zh-hard-test-500 from https://huggingface.co/datasets/vcr-org/VCR-wiki-zh-hard-test-500

License:CC-BY-SA-4.0Stargazers:1Issues:0Issues:0

VCR-wiki-en-hard-test-500

Raw data for VCR-wiki-en-hard-test-500 from https://huggingface.co/datasets/vcr-org/VCR-wiki-en-hard-test-500

License:CC-BY-SA-4.0Stargazers:0Issues:0Issues:0

lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

EfficientZeroV2

[ICML 2024, Spotlight] EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

License:GPL-3.0Stargazers:0Issues:0Issues:0

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

License:Apache-2.0Stargazers:0Issues:0Issues:0

surya

OCR, layout analysis, reading order, line detection in 90+ languages

License:GPL-3.0Stargazers:0Issues:0Issues:0

pykan

Kolmogorov Arnold Networks

License:MITStargazers:0Issues:0Issues:0

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

License:MITStargazers:0Issues:0Issues:0

Grounded-Segment-Anything

Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

License:Apache-2.0Stargazers:0Issues:0Issues:0

alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Language:Jupyter NotebookLicense:MITStargazers:1Issues:0Issues:0

pymdp

A Python implementation of active inference for Markov Decision Processes

License:MITStargazers:0Issues:0Issues:0

AlphaCLIP

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

License:Apache-2.0Stargazers:0Issues:0Issues:0

dreamerv3

Mastering Diverse Domains through World Models

License:MITStargazers:0Issues:0Issues:0

light_on_chatgpt

Good for e-ink monitor user to use ChatGPT. It makes the code blocks white and makes the UI wider.

Language:CSSLicense:MITStargazers:0Issues:0Issues:0

MergeLM

Codebase for Merging Language Models

Stargazers:0Issues:0Issues:0

Neural-Network-Architecture-Diagrams

Diagrams for visualizing neural network architecture (Created with diagrams.net)

License:MITStargazers:0Issues:0Issues:0

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

License:NOASSERTIONStargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

maze-transformer

This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.

Stargazers:0Issues:0Issues:0

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

License:MITStargazers:0Issues:0Issues:0