Jihan Yang (jihanyang)

jihanyang

Geek Repo

Company:The University of Hong Kong

Location:Hong Kong SAR

Home Page:https://jihanyang.github.io/

Github PK Tool:Github PK Tool


Organizations
CVMI-Lab

Jihan Yang's starred repositories

Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Language:ShellStargazers:4499Issues:0Issues:0
Language:PythonStargazers:44Issues:0Issues:0
Language:PythonLicense:BSD-3-ClauseStargazers:86Issues:0Issues:0

ALLaVA

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

Language:PythonLicense:Apache-2.0Stargazers:195Issues:0Issues:0

clip-beyond-tail

Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights

Stargazers:10Issues:0Issues:0

VideoTree

Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"

Language:PythonLicense:MITStargazers:32Issues:0Issues:0

LLoVi

Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"

Language:PythonLicense:MITStargazers:70Issues:0Issues:0

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonLicense:MITStargazers:1788Issues:0Issues:0

LanguageBind

怐ICLR 2024šŸ”„怑 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language:PythonLicense:MITStargazers:577Issues:0Issues:0

OpenGlass

Turn any glasses into AI-powered smart glasses

Language:CLicense:MITStargazers:2405Issues:0Issues:0

VQASynth

Compose multimodal datasets šŸŽ¹

Language:PythonStargazers:98Issues:0Issues:0

Video-ChatGPT

[ACL 2024 šŸ”„] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonLicense:CC-BY-4.0Stargazers:1012Issues:0Issues:0

visualwebarena

VisualWebArena is a benchmark for multimodal agents.

Language:PythonLicense:MITStargazers:159Issues:0Issues:0

mmc4

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Language:PythonLicense:MITStargazers:876Issues:0Issues:0

Kosmos-G

Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Language:PythonStargazers:21Issues:0Issues:0

Groma

Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonLicense:Apache-2.0Stargazers:446Issues:0Issues:0

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:21816Issues:0Issues:0

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonLicense:MITStargazers:32840Issues:0Issues:0

open-eqa

OpenEQA Embodied Question Answering in the Era of Foundation Models

Language:Jupyter NotebookLicense:MITStargazers:168Issues:0Issues:0

eai-vc

The repository for the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).

Language:PythonLicense:NOASSERTIONStargazers:437Issues:0Issues:0

datacomp

DataComp: In search of the next generation of multimodal datasets

Language:PythonLicense:NOASSERTIONStargazers:567Issues:0Issues:0

paperbot

PaperBot: Learning to Design Real-World Tools Using Paper

Language:PythonStargazers:10Issues:0Issues:0

VAR

[GPT beats diffusionšŸ”„] [scaling laws in visual generationšŸ“ˆ] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3636Issues:0Issues:0

ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)

Language:PythonLicense:Apache-2.0Stargazers:2775Issues:0Issues:0

VLN-CE

Vision-and-Language Navigation in Continuous Environments using Habitat

Language:PythonLicense:MITStargazers:220Issues:0Issues:0
Stargazers:48Issues:0Issues:0

GiT

Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"

Language:PythonLicense:Apache-2.0Stargazers:220Issues:0Issues:0

3D-LR

Can 3D Vision-Language Models Truly Understand Natural Language?

Stargazers:18Issues:0Issues:0

grok-1

Grok open release

Language:PythonLicense:Apache-2.0Stargazers:48992Issues:0Issues:0

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonLicense:Apache-2.0Stargazers:5087Issues:0Issues:0