felixfuu's starred repositories

stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:PythonLicense:MITStargazers:36680Issues:433Issues:285

alpaca-lora

Instruct-tune LLaMA on consumer hardware

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:18269Issues:156Issues:467

LLMsPracticalGuide

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Language:PythonLicense:MITStargazers:7830Issues:113Issues:299

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++License:Apache-2.0Stargazers:5532Issues:63Issues:623

Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonLicense:MITStargazers:3473Issues:100Issues:159

T2I-Adapter

T2I-Adapter

Language:PythonLicense:Apache-2.0Stargazers:3216Issues:41Issues:105

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Language:PythonLicense:BSD-3-ClauseStargazers:2930Issues:60Issues:86

co-tracker

CoTracker is a model for tracking any point (pixel) on a video.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:2455Issues:26Issues:66

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonLicense:AGPL-3.0Stargazers:2306Issues:39Issues:0

consistencydecoder

Consistency Distilled Diff VAE

Language:PythonLicense:MITStargazers:2081Issues:22Issues:19

awesome-diffusion-categorized

collection of diffusion model papers categorized by their subareas

MiniGPT-5

Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

Language:PythonLicense:Apache-2.0Stargazers:823Issues:12Issues:37

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

Entity

EntitySeg Toolbox: Towards Open-World and High-Quality Image Segmentation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:669Issues:23Issues:43

LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Language:PythonLicense:Apache-2.0Stargazers:640Issues:10Issues:24

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

InstructDiffusion

PyTorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.

Language:PythonLicense:NOASSERTIONStargazers:346Issues:10Issues:18
Language:PythonLicense:BSD-3-ClauseStargazers:326Issues:17Issues:16

Mini-DALLE3

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

ALIKE

ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction

Language:PythonLicense:BSD-3-ClauseStargazers:281Issues:9Issues:24

prompt-pretraining

Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"

Language:PythonLicense:Apache-2.0Stargazers:248Issues:5Issues:13

HumanBench

This repo is official implementation of HumanBench (CVPR2023)

Language:PythonLicense:MITStargazers:209Issues:9Issues:19

COMM

Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

MP-Former

[CVPR 2023] Official implementation of the paper: MP-Former: Mask-Piloted Transformer for Image Segmentation

Language:PythonLicense:NOASSERTIONStargazers:106Issues:7Issues:12

ALIP

[ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption

Language:PythonLicense:MITStargazers:76Issues:3Issues:1

BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Language:PythonLicense:BSD-3-ClauseStargazers:21Issues:3Issues:4