shannanyinxiang

Dezhi Peng's starred repositories

DocRes

[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Language:PythonMIT21300

RFUND

Official release of RFUND introduced in the paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction" (arXiv:2401.03472).

1500

UPOCR

Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)

Language:Python2400

Decoupled-attention-network

Pytorch implementation for "Decoupled attention network for text recognition".

Language:PythonMIT31000

Awesome-LLMs-Datasets

Summarize existing representative LLMs text datasets.

Apache-2.072500

FontDiffuser

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Language:Python22500

i. A practical application of Transformer (ViT) on 2-D physiological signal (EEG) classification tasks. Also could be tried with EMG, EOG, ECG, etc. ii. Including the attention of spatial dimension (channel attention) and *temporal dimension*. iii. Common spatial pattern (CSP), an efficient feature enhancement method, realized with Python.

Language:PythonGPL-3.023100

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonMIT155600

HST

This is an official implementation for "Hierarchical Side Tuning for Vision Transformers".

Language:Python1800

SCUT-EnsExam

SCUT-EnsExam is a real-world handwritten text erasure dataset for examination paper scenarios, which consists of 545 examination paper images. The dataset is randomly divided into training set and test set of 430 and 115 images, respectively.

700

CG-GAN

Official PyTorch implementation of the CVPR 2022 paper: "Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator"

Language:Python8400

ESTextSpotter

(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

Language:Python7000

GPT-4V_OCR

Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)

Language:Python10800

Real-CE

Real-CE: A Benchmark for Chinese-English Scene Text Image Super-resolution (ICCV2023)

Language:Python6600

LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

576900

DocTamper

[CVPR2023] Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution

Language:Python9600

a2m_chineseNMT

Dataset for TALLIP2019 paper "Ancient-Modern Chinese Translation with a New Large Training Dataset"

2100

Adaptor

Global Adaptive Transformer for Cross-Subject EEG Classification.

Language:PythonGPL-3.02000

CS-GAN

Common Spatial Generative Adversarial Networks based Data Augmentation for Cross-Subject Brain-Computer Interfacet

Language:PythonGPL-3.0700

EEG-Toolbox

A toolbox for EEG signals processing. Welcome to join and build!

Language:Python1000

EEG-Conformer

EEG Transformer 2.0. i. Convolutional Transformer for EEG Decoding. ii. Novel visualization - Class Activation Topography.

Language:PythonGPL-3.036400

Recommendations-Diffusion-Text-Image

A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten generation, scene text recognition and scene text detection.

15700