txytju

txytju's starred repositories

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonMIT59950 5050

MetaGPT

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Language:PythonMIT39035 854 468

TaskMatrix

Language:PythonNOASSERTION34456 309 348

llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Language:Jupyter NotebookApache-2.028360 306 45

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.015950 152 1230

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonApache-2.010737 90 990

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

8795 213 88

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookBSD-3-Clause8686 93 599

Llama2-Chinese

Llama中文社区，最好的中文Llama大模型，完全开源可商用

Language:Python8008 105 262

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language:PythonApache-2.04940 63 360

BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language:Jupyter NotebookBSD-3-Clause4237 34 187

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION3623 45 337

open_flamingo

An open-source framework for training large multimodal models.

Language:PythonMIT3451 47 167

InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

Language:Python1563 28 232

MM-REACT

Official repo for MM-REACT

Language:PythonMIT905 19 10

magvit

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Language:PythonApache-2.0843 75 19

FaceFormer

[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers

Language:PythonMIT720 15 101

Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

MIT336 6 4

RenderIH

Official PyTorch implementation of "RenderIH: A large-scale synthetic dataset for 3D interacting hand pose estimation", ICCV 2023

Language:PythonGPL-3.0298 3 4

Instruct2Act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Language:Python253 3 18

RoboFlamingo

Code for RoboFlamingo

Language:PythonMIT201 5 32

hamer

HaMeR: Reconstructing Hands in 3D with Transformers

Language:PythonMIT190 7 38

RT-X

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"

Language:PythonMIT104 5 6

OmniScient-Model

This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model

Language:Jupyter NotebookApache-2.087 9 4

ReFit

Repository for ICCV23 paper: "ReFit: Recurrent Fitting Network for 3D Human Recovery"

Language:PythonMIT70 4 5

DIR

[ICCV 2023 Oral] Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from a Single RGB Image

Language:PythonMIT63 5 8

POEM

[CVPR 2023] POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo

Language:PythonApache-2.054 9 2

STCFormer

(CVPR2023)3D Human Pose Estimation with Spatio-Temporal Criss-cross Attention

Language:Python46 6 5

HaMuCo

[ICCV 2023] HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning

Language:PythonMIT36 4 4

InterPrior_pytorch

Offical code for ICCV2023 InterPrior

15 4 2