TianheWu

Tianhe Wu's starred repositories

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

CC-BY-4.0100

VideoWatermarking

Language:Python300

LeetcodeTop

汇总各大互联网公司容易考察的高频leetcode题🔥

1840900

Q-Ground

Official codes for "Q-Ground: Image Quality Grounding with Large Multi-modality Models", ACM MM2024 (Oral)

NOASSERTION1700

LLaVA-NeXT

Language:Python140000

MEFNet

Official Implementation of MEF-Net

Language:Python8300

3dpe

[ECCV 2024] 3DPE: Real-time 3D-aware Portrait Editing from a Single Image

1600

open_clip

An open source implementation of CLIP.

Language:PythonNOASSERTION938700

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:PythonApache-2.0231200

HiDiffusion

[ECCV 2024] HiDiffusion: Increases the resolution and speed of your diffusion model by only adding a single line of code!

Language:Jupyter NotebookApache-2.071400

CoSeR

An unofficial implementation for "CoSeR: Bridging Image and Language for Cognitive Super-Resolution (CVPR 2024)"

Language:PythonMIT2000

Awesome-High-Resolution-Diffusion

🔥🔥🔥A curated list of papers on recent diffusion-based high-resolution image and video synthesis works.

3900

ResMaster

Apache-2.05900

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonApache-2.0161400

CVPR-2024-Papers

59700

StyleCrafter-SDXL

Code of StyleCrafter on SDXL

Language:PythonApache-2.01100

LM4LV

🔥Official PyTorch implementation for "LM4LV: A Frozen Large Language Model for Low-level Vision Tasks".

Language:PythonApache-2.03000

Diff-Plugin

[CVPR 2024] Official code release of our paper "Diff-Plugin: Revitalizing Details for Diffusion-based Low-level tasks"

Language:Python10400

stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:PythonMIT3786600

CaD-VI

Comparison Visual Instruction Tuning (CaD-VI)

Language:Python400

ChartMimic

ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation

Language:Python6700

Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Language:PythonApache-2.034800

OSEDiff

Language:Python10300

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT109500

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonNOASSERTION813100

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonNOASSERTION296000

MMDialog

The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

Language:Python18100

CCSR

Official codes of CCSR: Improving the Stability of Diffusion Models for Content Consistent Super-Resolution

Language:Python40000

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Language:PythonMIT451200

HDC

The official implementation of Hierarchical Semantic Decoding with Counting Assitance for Generalized Referring Expression Segmentation

MIT1400