lcy0604

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:Python685 29 58

Text2Tex

[ICCV 2023] Text2Tex: Text-driven Texture Synthesis via Diffusion Models

Language:PythonNOASSERTION530 40 29

anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Language:Python52800

synthtiger

Official Implementation of SynthTIGER (Synthetic Text Image Generator), ICDAR 2021

Language:PythonMIT454 6 41

i. A practical application of Transformer (ViT) on 2-D physiological signal (EEG) classification tasks. Also could be tried with EMG, EOG, ECG, etc. ii. Including the attention of spatial dimension (channel attention) and *temporal dimension*. iii. Common spatial pattern (CSP), an efficient feature enhancement method, realized with Python.

Language:PythonGPL-3.0234 3 11

lcy0604

Chongyu-Liu's starred repositories

geektime-books

imagen-pytorch

LWM

LLM-Agent-Paper-List

DiT

MGM

LLaMA2-Accessory

T-Rex

OpenDiT

awesome_LLMs_interview_notes

LLM-in-Vision

VisionLLM

Awesome-LLMs-Datasets

groundingLMM

Text2Tex

anole

synthtiger

EEG-Transformer

FontDiffuser

DocRes

Document-AI-Recommendations

DiffMatch

GPT-4V_OCR

UReader

ChartAst

OWTTT

UPOCR

RFUND

One-DM

MegaHan97K