zhanghaonan777

followers

following

stars

zhanghaonan777's starred repositories

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION390900

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonApache-2.051800

MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Language:PythonMIT13300

llm-inference-benchmark

LLM Inference benchmark

Language:PythonMIT24600

VideoMamba

VideoMamba: State Space Model for Efficient Video Understanding

Language:PythonApache-2.061500

ABigSurveyOfLLMs

A collection of 150+ surveys on LLMs

CC0-1.013800

Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonApache-2.064900

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonApache-2.0506700

grok-1

Grok open release

Language:PythonApache-2.04851000

generate

A Python Package to Access World-Class Generative Models

Language:PythonMIT12000

GroundingGPT

[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model

Language:PythonApache-2.022500

SoraReview

The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Language:PythonBSD-3-Clause292000

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonApache-2.0406300

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Language:PythonMIT525200

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Language:PythonApache-2.0383800

jsonformer

A Bulletproof Way to Generate Structured JSON from Language Models

Language:Jupyter NotebookMIT388500

autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

Language:Jupyter NotebookCC-BY-4.02604400

camel

🐫 CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society (NeruIPS'2023) https://www.camel-ai.org

Language:PythonApache-2.0452200

OpenAgents

OpenAgents: An Open Platform for Language Agents in the Wild

Language:PythonApache-2.0362100

TAAC-2021-Task2-Rank6

2021 腾讯广告赛算法大赛赛道二决赛第六名

Language:Python3500

RDS

Language:Python1000

multimodal-knowledge-graph

A collection of resources on multimodal knowledge graph, including datasets, papers and contests.

9100

pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Language:PythonApache-2.03006700

Awesome-Multimodality

A Survey on multimodal learning research.

vilbert_beta

Language:Jupyter Notebook46800

MKT

Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".

Language:PythonMIT11200

Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

Visual-Chinese-LLaMA-Alpaca

多模态中文LLaMA&Alpaca大语言模型（VisualCLA）

Language:PythonApache-2.037200

TagGPT

TagGPT: Large Language Models are Zero-shot Multimodal Taggers

Language:PythonApache-2.05200