Artanic30

Artanic30

Geek Repo

Company:ShanghaiTech University

Location:Shanghai

Github PK Tool:Github PK Tool


Organizations
JeekITClub
ShanghaitechGeekPie

Artanic30's starred repositories

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonLicense:Apache-2.0Stargazers:1839Issues:0Issues:0

anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Language:PythonStargazers:398Issues:0Issues:0

ALLaVA

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

Language:PythonLicense:Apache-2.0Stargazers:219Issues:0Issues:0

DivideMix

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Language:PythonLicense:MITStargazers:523Issues:0Issues:0

MacCap

AAAI 2024 Accepted Paper Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Language:PythonStargazers:7Issues:0Issues:0

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonLicense:NOASSERTIONStargazers:1483Issues:0Issues:0

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonLicense:MITStargazers:1002Issues:0Issues:0

NaviLLM

[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'

Language:PythonLicense:MITStargazers:93Issues:0Issues:0

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonLicense:MITStargazers:1873Issues:0Issues:0

DiT-3D

🔥🔥🔥Official Codebase of "DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation"

Language:PythonLicense:Apache-2.0Stargazers:195Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:25Issues:0Issues:0

Retrieval-Augmented-Visual-Question-Answering

This is the official repository for Retrieval Augmented Visual Question Answering

Language:PythonLicense:GPL-3.0Stargazers:126Issues:0Issues:0

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Language:PythonStargazers:471Issues:0Issues:0

ManipVQA

[IROS24 Oral]ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

Language:PythonStargazers:43Issues:0Issues:0

VL-CheckList

Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.

Language:PythonStargazers:122Issues:0Issues:0

all-seeing

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"

Language:PythonStargazers:418Issues:0Issues:0

SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Language:PythonLicense:MITStargazers:550Issues:0Issues:0

mamba

Mamba SSM architecture

Language:PythonLicense:Apache-2.0Stargazers:11608Issues:0Issues:0

aliyunpan

阿里云盘命令行客户端,支持JavaScript插件,支持同步备份功能。

Language:GoLicense:Apache-2.0Stargazers:3886Issues:0Issues:0

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:PythonStargazers:669Issues:0Issues:0

COLA

COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!

License:MITStargazers:19Issues:0Issues:0

BenchLMM

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Language:PythonLicense:Apache-2.0Stargazers:81Issues:0Issues:0

MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices

Language:PythonLicense:Apache-2.0Stargazers:885Issues:0Issues:0
Language:PythonStargazers:13Issues:0Issues:0

vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

Language:PythonLicense:MITStargazers:217Issues:0Issues:0

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5659Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:123Issues:0Issues:0

SEED

Official implementation of SEED-LLaMA (ICLR 2024).

Language:PythonLicense:NOASSERTIONStargazers:518Issues:0Issues:0

CoVLM

Official implementation for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

Language:PythonLicense:MITStargazers:39Issues:0Issues:0

taming-transformers

Taming Transformers for High-Resolution Image Synthesis

Language:Jupyter NotebookLicense:MITStargazers:5559Issues:0Issues:0