Masked Autoencoders as Image Processors

The code and pre-trained models of the paper "Masked Autoencoders as Image Processors" will be released in this repository.

Abstract: Transformers have shown significant effectiveness for various vision tasks including both high-level vision and low-level vision. Recently, masked autoencoders (MAE) for feature pre-training have further unleashed the potential of Transformers, leading to state-of-the-art performances on various high-level vision tasks. However, the significance of MAE pre-training on low-level vision tasks has not been sufficiently explored. In this paper, we show that masked autoencoders are also scalable self-supervised learners for image processing tasks. We first present an efficient Transformer model considering both channel attention and shifted-window-based self-attention termed CSformer. Then we develop an effective MAE architecture for image processing (MAEIP) tasks. Extensive experimental results show that with the help of MAEIP pre-training, our proposed CSformer achieves state-of-the-art performance on various image processing tasks, including Gaussian denoising, real image denoising, single-image motion deblurring, defocus deblurring, and image deraining.

Visual Results

Part visual results are available below. More visual results will come soon.

Gaussian Denoising: [百度网盘] [TeraBox]
Real Denoising: [百度网盘] [TeraBox]
Motion Deblurring: [百度网盘] [TeraBox]
Defocus Deblurring: [百度网盘] [TeraBox]
Deraining: [百度网盘] [TeraBox]

Contact

If you have any question, please contact huiyuduan@sjtu.edu.cn

DuanHuiyu / MAEIP_CSformer

Masked Autoencoders as Image Processors

Visual Results

Contact

About