lcy0604

followers

following

stars

SCUT

Guangzhou

Chongyu-Liu's starred repositories

geektime-books

:books: 极客时间电子书

imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Language:PythonMIT8024 114 300

LWM

Language:PythonApache-2.07096 66 71

LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION6065 45 80

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.03188 28 131

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonNOASSERTION2694 36 134

InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Language:PythonMIT2486 34 264

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonNOASSERTION2165 36 84

ALBEF

Code for ALBEF: a new vision-language pre-training method

Language:PythonBSD-3-Clause1523 12 141

awesome_LLMs_interview_notes

LLMs interview notes and answers:该仓库主要记录大模型（LLMs）算法工程师相关的面试题和参考答案

VisionLLM

VisionLLM Series

Language:PythonApache-2.0857 42 13

Awesome-LLMs-Datasets

Summarize existing representative LLMs text datasets.

Apache-2.0838 4 2

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

CLIP_benchmark

CLIP-like model evaluation

Language:Jupyter NotebookMIT590 12 64

Text2Tex

[ICCV 2023] Text2Tex: Text-driven Texture Synthesis via Diffusion Models

Language:PythonNOASSERTION546 39 31

DocRes

[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Language:PythonMIT295 6 16

FontDiffuser

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Language:Python274 5 29

EEG-Transformer

i. A practical application of Transformer (ViT) on 2-D physiological signal (EEG) classification tasks. Also could be tried with EMG, EOG, ECG, etc. ii. Including the attention of spatial dimension (channel attention) and *temporal dimension*. iii. Common spatial pattern (CSP), an efficient feature enhancement method, realized with Python.

Language:PythonGPL-3.0249 3 11

Recommendations-Diffusion-Text-Image

A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten generation, scene text recognition and scene text detection.

DiffMatch

Official implementation of "Diffusion Model for Dense Matching" (ICLR'24 Oral)

Language:Python141 11 9

GPT-4V_OCR

Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)

Language:Python117 8 5

UReader

Language:PythonApache-2.0111 3 15

ChartAst

ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.

Language:PythonNOASSERTION105 6 23

ESTextSpotter

(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

Language:Python71 3 20

OWTTT

[ICCV 2023 Oral] Official repository for “On the Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion”

Language:PythonMIT42 3 2

UPOCR

Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)

Language:Python3700

M5HisDoc

GPL-3.024 20

RFUND

Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"

SCUT-EnsExam

SCUT-EnsExam is a real-world handwritten text erasure dataset for examination paper scenarios, which consists of 545 examination paper images. The dataset is randomly divided into training set and test set of 430 and 115 images, respectively.

7 10