MikeWangWZHL

Zhenhailong Wang's starred repositories

qwqjsq

qwqjsq.com 的最新地址

swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonApache-2.0262200

anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Language:Python55700

VQGAN-LC

Language:Python7500

Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Language:PythonApache-2.034800

1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Language:Jupyter NotebookApache-2.030000

Awesome-Video-Datasets

Video datasets

106200

Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

48300

VDLM

Repo for paper: Text-based Reasoning About Vector Graphics

Language:Python1800

nerfies.github.io

Language:JavaScript217600

alphageometry

Language:PythonApache-2.0392400

CHOCOLATE

Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"

Language:Jupyter NotebookApache-2.02300

EEG-To-Text

Language:Python8100

maze-dataset

maze datasets for investigating OOD behavior of ML systems

Language:Jupyter Notebook1400

torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data

Language:PythonMIT236100

Tracking-Anything-with-DEVA

[ICCV 2023] Tracking Anything with Decoupled Video Segmentation

Language:PythonNOASSERTION117900

ecole-dataset

Language:Python500

atp-video-language

Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (ATP).

Language:PythonMIT4700

Solo-Performance-Prompting

Repo for paper "Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration"

Language:Python29800

tree-of-thought-llm

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Language:PythonMIT444900

LLM-ToolMaker

Language:Jupyter Notebook101200

Paxion

Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight

Language:Python3200

viper

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"

Language:Jupyter NotebookNOASSERTION163900

yt-dlp

A feature-rich command-line audio/video downloader

Language:PythonUnlicense7813000

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.0116200

dalle2-laion

Pretrained Dalle2 from laion

Language:Python50000

imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Language:PythonMIT791400

DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

Language:PythonMIT1098700

LookForTheChange

Code for Look for the Change paper published at CVPR 2022

Language:PythonMIT3500

procthor-10k

The ProcTHOR-10K Houses Dataset

Language:PythonApache-2.07000