Chongyu-Liu (lcy0604)

lcy0604

Geek Repo

Company:SCUT

Location:Guangzhou

Github PK Tool:Github PK Tool

Chongyu-Liu's starred repositories

imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Language:PythonLicense:MITStargazers:7898Issues:113Issues:300
Language:PythonLicense:Apache-2.0Stargazers:7021Issues:66Issues:67

LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5723Issues:46Issues:75

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonLicense:Apache-2.0Stargazers:3102Issues:26Issues:128

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonLicense:NOASSERTIONStargazers:2624Issues:36Issues:133

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:2013Issues:36Issues:75

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonLicense:Apache-2.0Stargazers:1352Issues:23Issues:54

awesome_LLMs_interview_notes

LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

VisionLLM

VisionLLM Series

Language:PythonLicense:Apache-2.0Stargazers:748Issues:38Issues:11

Awesome-LLMs-Datasets

Summarize existing representative LLMs text datasets.

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Text2Tex

[ICCV 2023] Text2Tex: Text-driven Texture Synthesis via Diffusion Models

Language:PythonLicense:NOASSERTIONStargazers:530Issues:40Issues:29

anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Language:PythonStargazers:528Issues:0Issues:0

synthtiger

Official Implementation of SynthTIGER (Synthetic Text Image Generator), ICDAR 2021

Language:PythonLicense:MITStargazers:454Issues:6Issues:41

EEG-Transformer

i. A practical application of Transformer (ViT) on 2-D physiological signal (EEG) classification tasks. Also could be tried with EMG, EOG, ECG, etc. ii. Including the attention of spatial dimension (channel attention) and *temporal dimension*. iii. Common spatial pattern (CSP), an efficient feature enhancement method, realized with Python.

Language:PythonLicense:GPL-3.0Stargazers:234Issues:3Issues:11

FontDiffuser

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

DocRes

[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Language:PythonLicense:MITStargazers:221Issues:6Issues:7

Document-AI-Recommendations

Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.

DiffMatch

Official implementation of "Diffusion Model for Dense Matching" (ICLR'24 Oral)

GPT-4V_OCR

Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)

Language:PythonLicense:Apache-2.0Stargazers:102Issues:3Issues:15

ChartAst

ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.

Language:PythonLicense:NOASSERTIONStargazers:66Issues:8Issues:20

OWTTT

[ICCV 2023 Oral] Official repository for “On the Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion”

Language:PythonLicense:MITStargazers:39Issues:3Issues:2

UPOCR

Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)

Language:PythonStargazers:27Issues:0Issues:0

RFUND

Official release of RFUND introduced in the paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction" (arXiv:2401.03472).

One-DM

Official implementation of One-DM (ECCV 2024).

License:MITStargazers:5Issues:0Issues:0

MegaHan97K

MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories

Language:PythonStargazers:4Issues:0Issues:0