JonnieWayy

Zijie Wang's starred repositories

ollama

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.023564 196 3701

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.013927 114 368

codellama

Inference code for CodeLlama models

Language:PythonNOASSERTION13860 159 169

latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:Jupyter NotebookMIT10908 97 333

Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Language:PythonMIT6215 60 128

GroundingDINO

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonApache-2.05439 37 279

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Language:PythonApache-2.03695 52 49

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonBSD-3-Clause2540 31 149

StableSR

Exploiting Diffusion Prior for Real-World Image Super-Resolution

Language:PythonNOASSERTION1923 23 134

VMamba

VMamba: Visual State Space Models，code is based on mamba

Language:PythonMIT1760 16 232

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonCC-BY-4.01025 14 104

Awesome-diffusion-model-for-image-processing

one summary of diffusion-based image processing, including restoration, enhancement, coding, quality assessment

Apache-2.0486 14 3

PickScore

Language:PythonMIT360 3 27

MOSE-api

[ICCV 2023] MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

Language:Python295 6 15

ViT-Slim

Official code for our CVPR'22 paper “Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space”

Language:PythonMIT241 7 17

KEPLER

Source code for TACL paper "KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation".

Language:PythonMIT189 10 28

BeMapNet

Language:PythonNOASSERTION174 14 25

ContextDET

Contextual Object Detection with Multimodal Large Language Models

NOASSERTION168 13 5

StreamMapNet

Language:PythonGPL-3.0163 8 27

bassl

Language:PythonApache-2.0111 5 16

REVERIE

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

Language:C++106 5 19

SUR-adapter

ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities from large language models to build a high-quality textual semantic representation for text-to-image generation.

Language:PythonMIT105 4 7

ldcast

Latent diffusion for generative precipitation nowcasting

Language:PythonApache-2.078 9 20

prompt2walk

Code for Prompt a Robot to Walk with Large Language Models https://arxiv.org/abs/2309.09969

Language:Python75 6 2

EgoObjects

[ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding

Language:PythonMIT74 4 1

VidSTG-Dataset

This repository provides the dataset introduced by the paper "Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences"

52 3 5

HC-STVG

The HC-STVG Dataset

Language:Python52 4 22

Precipitation-nowcasting-with-generative-diffusion-models

Code relative to the publication "Precipitation nowcasting with generative diffusion models"

Language:Python22 20

TREK-150-toolkit

Official code repository to download the TREK-150 benchmark dataset and run experiments on it.

Language:Python11 1 6