Chongyu-Liu (lcy0604)

lcy0604

Geek Repo

Company:SCUT

Location:Guangzhou

Github PK Tool:Github PK Tool

Chongyu-Liu's starred repositories

imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Language:PythonLicense:MITStargazers:8024Issues:114Issues:300
Language:PythonLicense:Apache-2.0Stargazers:7096Issues:66Issues:71

LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:6065Issues:45Issues:80

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonLicense:Apache-2.0Stargazers:3188Issues:28Issues:131

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonLicense:NOASSERTIONStargazers:2694Issues:36Issues:134

InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Language:PythonLicense:MITStargazers:2486Issues:34Issues:264

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:2165Issues:36Issues:84

ALBEF

Code for ALBEF: a new vision-language pre-training method

Language:PythonLicense:BSD-3-ClauseStargazers:1523Issues:12Issues:141

awesome_LLMs_interview_notes

LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案

VisionLLM

VisionLLM Series

Language:PythonLicense:Apache-2.0Stargazers:857Issues:42Issues:13

Awesome-LLMs-Datasets

Summarize existing representative LLMs text datasets.

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

CLIP_benchmark

CLIP-like model evaluation

Language:Jupyter NotebookLicense:MITStargazers:590Issues:12Issues:64

Text2Tex

[ICCV 2023] Text2Tex: Text-driven Texture Synthesis via Diffusion Models

Language:PythonLicense:NOASSERTIONStargazers:546Issues:39Issues:31

DocRes

[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Language:PythonLicense:MITStargazers:295Issues:6Issues:16

FontDiffuser

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

EEG-Transformer

i. A practical application of Transformer (ViT) on 2-D physiological signal (EEG) classification tasks. Also could be tried with EMG, EOG, ECG, etc. ii. Including the attention of spatial dimension (channel attention) and *temporal dimension*. iii. Common spatial pattern (CSP), an efficient feature enhancement method, realized with Python.

Language:PythonLicense:GPL-3.0Stargazers:249Issues:3Issues:11

Recommendations-Diffusion-Text-Image

A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten generation, scene text recognition and scene text detection.

DiffMatch

Official implementation of "Diffusion Model for Dense Matching" (ICLR'24 Oral)

GPT-4V_OCR

Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)

Language:PythonLicense:Apache-2.0Stargazers:111Issues:3Issues:15

ChartAst

ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.

Language:PythonLicense:NOASSERTIONStargazers:105Issues:6Issues:23

ESTextSpotter

(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

OWTTT

[ICCV 2023 Oral] Official repository for “On the Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion”

Language:PythonLicense:MITStargazers:42Issues:3Issues:2

UPOCR

Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)

Language:PythonStargazers:37Issues:0Issues:0
License:GPL-3.0Stargazers:24Issues:2Issues:0

RFUND

Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"

SCUT-EnsExam

SCUT-EnsExam is a real-world handwritten text erasure dataset for examination paper scenarios, which consists of 545 examination paper images. The dataset is randomly divided into training set and test set of 430 and 115 images, respectively.