Zijie Wang (JonnieWayy)

JonnieWayy

Geek Repo

Company:http://wzj.life/

Location:Njtech University

Home Page:https://scholar.google.com/citations?hl=zh-CN&user=T6LJd-8AAAAJ

Github PK Tool:Github PK Tool


Organizations
apachecn
NjtechCVLab
njtechgreenstudio
piggywolfstudio
Programming-With-Love

Zijie Wang's starred repositories

ollama

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonLicense:Apache-2.0Stargazers:23564Issues:196Issues:3701

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:13927Issues:114Issues:368

codellama

Inference code for CodeLlama models

Language:PythonLicense:NOASSERTIONStargazers:13860Issues:159Issues:169

latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:Jupyter NotebookLicense:MITStargazers:10908Issues:97Issues:333

Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Language:PythonLicense:MITStargazers:6215Issues:60Issues:128

GroundingDINO

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonLicense:Apache-2.0Stargazers:5439Issues:37Issues:279

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Language:PythonLicense:Apache-2.0Stargazers:3695Issues:52Issues:49

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonLicense:BSD-3-ClauseStargazers:2540Issues:31Issues:149

StableSR

Exploiting Diffusion Prior for Real-World Image Super-Resolution

Language:PythonLicense:NOASSERTIONStargazers:1923Issues:23Issues:134

VMamba

VMamba: Visual State Space Models,code is based on mamba

Language:PythonLicense:MITStargazers:1760Issues:16Issues:232

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonLicense:CC-BY-4.0Stargazers:1025Issues:14Issues:104

Awesome-diffusion-model-for-image-processing

one summary of diffusion-based image processing, including restoration, enhancement, coding, quality assessment

MOSE-api

[ICCV 2023] MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

ViT-Slim

Official code for our CVPR'22 paper “Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space”

Language:PythonLicense:MITStargazers:241Issues:7Issues:17

KEPLER

Source code for TACL paper "KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation".

Language:PythonLicense:MITStargazers:189Issues:10Issues:28
Language:PythonLicense:NOASSERTIONStargazers:174Issues:14Issues:25

ContextDET

Contextual Object Detection with Multimodal Large Language Models

Language:PythonLicense:GPL-3.0Stargazers:163Issues:8Issues:27
Language:PythonLicense:Apache-2.0Stargazers:111Issues:5Issues:16

REVERIE

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

SUR-adapter

ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities from large language models to build a high-quality textual semantic representation for text-to-image generation.

Language:PythonLicense:MITStargazers:105Issues:4Issues:7

ldcast

Latent diffusion for generative precipitation nowcasting

Language:PythonLicense:Apache-2.0Stargazers:78Issues:9Issues:20

prompt2walk

Code for Prompt a Robot to Walk with Large Language Models https://arxiv.org/abs/2309.09969

EgoObjects

[ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding

Language:PythonLicense:MITStargazers:74Issues:4Issues:1

VidSTG-Dataset

This repository provides the dataset introduced by the paper "Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences"

HC-STVG

The HC-STVG Dataset

Precipitation-nowcasting-with-generative-diffusion-models

Code relative to the publication "Precipitation nowcasting with generative diffusion models"

Language:PythonStargazers:22Issues:2Issues:0

TREK-150-toolkit

Official code repository to download the TREK-150 benchmark dataset and run experiments on it.