mrswang1

0

followers

following

stars

mrswang1's starred repositories

llama-agentic-system

Agentic components of the Llama Stack APIs

Language:PythonNOASSERTION260400

gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Language:PythonApache-2.03127800

LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Language:PythonApache-2.067400

LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Language:Python76800

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2497300

llama

Inference code for Llama models

Language:PythonNOASSERTION5476600

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonApache-2.0157700

unicom

[ICLR 2023] Unicom: Universal and Compact Representation Learning for Image Retrieval

Language:Python20900

LeetCode-Go

✅ Solutions to LeetCode by Go, 100% test coverage, runtime beats 100% / LeetCode 题解

Language:GoMIT3250600

mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Language:PythonMIT27600

insightface

State-of-the-art 2D and 3D Face Analysis Project

Language:Python2219300

Efficient-Multimodal-LLMs-Survey

Efficient Multimodal Large Language Models: A Survey

Apache-2.019500

Awesome_Multimodel_LLM

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

awesome-pretrained-chinese-nlp-models

Awesome Pretrained Chinese NLP Models，高质量中文预训练模型&大模型&多模态模型&大语言模型集合

Language:PythonMIT459900

VLM_survey

Collection of AWESOME vision-language models for vision tasks

CM3Leon

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

Language:PythonMIT35000

Subspace-Tuning

A generalized framework for subspace tuning methods in parameter efficient fine-tuning.

Language:PythonApache-2.04300

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION446400

SEED-X

Multimodal Models in Real World

Language:Jupyter NotebookNOASSERTION34500

VLE

VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)

Language:PythonApache-2.017600

lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Language:PythonNOASSERTION116600

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01843700

VQA

Language:PythonNOASSERTION35000

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonNOASSERTION161000

LAION-Face

The human face subset of LAION-400M for large-scale face pretraining.

Language:Python26500

facer

Face analysis tools for modern research, equipped with state-of-the-art Face Parsing and Face Alignment

Language:PythonMIT31800

anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Language:Python55700

Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Language:PythonApache-2.034800

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookNOASSERTION6691700

stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:PythonMIT3785800