txytju

txytju

Geek Repo

Company:Ytech Kwai

Location:Beijing,China

Github PK Tool:Github PK Tool

txytju's starred repositories

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonLicense:MITStargazers:59950Issues:505Issues:0

MetaGPT

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Language:PythonLicense:MITStargazers:39035Issues:854Issues:468
Language:PythonLicense:NOASSERTIONStargazers:34456Issues:309Issues:348

llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:28360Issues:306Issues:45

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:15950Issues:152Issues:1230

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:10737Issues:90Issues:990

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:8686Issues:93Issues:599

Llama2-Chinese

Llama中文社区,最好的中文Llama大模型,完全开源可商用

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language:PythonLicense:Apache-2.0Stargazers:4940Issues:63Issues:360

BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:4237Issues:34Issues:187

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:3623Issues:45Issues:337

open_flamingo

An open-source framework for training large multimodal models.

Language:PythonLicense:MITStargazers:3451Issues:47Issues:167

InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

MM-REACT

Official repo for MM-REACT

Language:PythonLicense:MITStargazers:905Issues:19Issues:10

magvit

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Language:PythonLicense:Apache-2.0Stargazers:843Issues:75Issues:19

FaceFormer

[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers

Language:PythonLicense:MITStargazers:720Issues:15Issues:101

Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

RenderIH

Official PyTorch implementation of "RenderIH: A large-scale synthetic dataset for 3D interacting hand pose estimation", ICCV 2023

Language:PythonLicense:GPL-3.0Stargazers:298Issues:3Issues:4

Instruct2Act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

RoboFlamingo

Code for RoboFlamingo

Language:PythonLicense:MITStargazers:201Issues:5Issues:32

hamer

HaMeR: Reconstructing Hands in 3D with Transformers

Language:PythonLicense:MITStargazers:190Issues:7Issues:38

RT-X

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"

Language:PythonLicense:MITStargazers:104Issues:5Issues:6

OmniScient-Model

This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:87Issues:9Issues:4

ReFit

Repository for ICCV23 paper: "ReFit: Recurrent Fitting Network for 3D Human Recovery"

Language:PythonLicense:MITStargazers:70Issues:4Issues:5

DIR

[ICCV 2023 Oral] Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from a Single RGB Image

Language:PythonLicense:MITStargazers:63Issues:5Issues:8

POEM

[CVPR 2023] POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo

Language:PythonLicense:Apache-2.0Stargazers:54Issues:9Issues:2

STCFormer

(CVPR2023)3D Human Pose Estimation with Spatio-Temporal Criss-cross Attention

HaMuCo

[ICCV 2023] HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning

Language:PythonLicense:MITStargazers:36Issues:4Issues:4

InterPrior_pytorch

Offical code for ICCV2023 InterPrior