zhanghaonan777's starred repositories

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:3909Issues:0Issues:0

tensorrtllm_backend

The Triton TensorRT-LLM Backend

Language:PythonLicense:Apache-2.0Stargazers:518Issues:0Issues:0

MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Language:PythonLicense:MITStargazers:133Issues:0Issues:0

llm-inference-benchmark

LLM Inference benchmark

Language:PythonLicense:MITStargazers:246Issues:0Issues:0

VideoMamba

VideoMamba: State Space Model for Efficient Video Understanding

Language:PythonLicense:Apache-2.0Stargazers:615Issues:0Issues:0

ABigSurveyOfLLMs

A collection of 150+ surveys on LLMs

License:CC0-1.0Stargazers:138Issues:0Issues:0

Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonLicense:Apache-2.0Stargazers:649Issues:0Issues:0

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonLicense:Apache-2.0Stargazers:5067Issues:0Issues:0

grok-1

Grok open release

Language:PythonLicense:Apache-2.0Stargazers:48510Issues:0Issues:0

generate

A Python Package to Access World-Class Generative Models

Language:PythonLicense:MITStargazers:120Issues:0Issues:0

GroundingGPT

[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model

Language:PythonLicense:Apache-2.0Stargazers:225Issues:0Issues:0

SoraReview

The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".

Stargazers:458Issues:0Issues:0

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Language:PythonLicense:BSD-3-ClauseStargazers:2920Issues:0Issues:0

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonLicense:Apache-2.0Stargazers:4063Issues:0Issues:0

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Language:PythonLicense:MITStargazers:5252Issues:0Issues:0

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Language:PythonLicense:Apache-2.0Stargazers:3838Issues:0Issues:0

jsonformer

A Bulletproof Way to Generate Structured JSON from Language Models

Language:Jupyter NotebookLicense:MITStargazers:3885Issues:0Issues:0

autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

Language:Jupyter NotebookLicense:CC-BY-4.0Stargazers:26044Issues:0Issues:0

camel

🐫 CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society (NeruIPS'2023) https://www.camel-ai.org

Language:PythonLicense:Apache-2.0Stargazers:4522Issues:0Issues:0

OpenAgents

OpenAgents: An Open Platform for Language Agents in the Wild

Language:PythonLicense:Apache-2.0Stargazers:3621Issues:0Issues:0

TAAC-2021-Task2-Rank6

2021 腾讯广告赛算法大赛 赛道二 决赛第六名

Language:PythonStargazers:35Issues:0Issues:0
Language:PythonStargazers:10Issues:0Issues:0

multimodal-knowledge-graph

A collection of resources on multimodal knowledge graph, including datasets, papers and contests.

Stargazers:91Issues:0Issues:0

pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Language:PythonLicense:Apache-2.0Stargazers:30067Issues:0Issues:0

Awesome-Multimodality

A Survey on multimodal learning research.

Stargazers:276Issues:0Issues:0
Language:Jupyter NotebookStargazers:468Issues:0Issues:0

MKT

Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".

Language:PythonLicense:MITStargazers:112Issues:0Issues:0

Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

Stargazers:395Issues:0Issues:0

Visual-Chinese-LLaMA-Alpaca

多模态中文LLaMA&Alpaca大语言模型(VisualCLA)

Language:PythonLicense:Apache-2.0Stargazers:372Issues:0Issues:0

TagGPT

TagGPT: Large Language Models are Zero-shot Multimodal Taggers

Language:PythonLicense:Apache-2.0Stargazers:52Issues:0Issues:0