hustvl / PySA

Pyramid Self-Attention for Semantic Segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pyramid Self-Attention for Semantic Segmentation

Paper: Pyramid Self-Attention for Semantic Segmentation

by Jiyang Qi1, Xinggang Wang1,2 *, Yao Hu3, Xu Tang3, Wenyu Liu1

1 School of EIC, HUST, 2 Hubei Key Laboratory of Smart Internet Technology, 3 Alibaba Group

(*) Corresponding Author

Introduction

PySA

Self-attention is vital in computer vision since it is the building block of Transformer and can model long-range context for visual recognition. However, computing pairwise self-attention between all pixels for dense prediction tasks (e.g., semantic segmentation) costs high computation. In this work, we propose a novel pyramid self-attention (PySA) mechanism which can collect global context information far more efficiently. Concretely, the basic module of PySA first divides the whole image into R x R regions, and then further divides every region into G x G grids. One feature is extracted for each grid and then self-attention is applied to the grid features within the same region. PySA keeps increasing R (e.g., from 1 to 8) to harvest more local context information and propagate global context to local regions in a parallel/series manner. Since G can be kept as a small value, the computation complexity is low.

Architecture

PySA-Series

Overview of the proposed PySA-Series.

PySA-Parallel

Overview of the proposed PySA-Parallel.

By keeping increasing R (e.g., from 1 to 8), models can harvest more local context information and propagate global context to local regions in a parallel/series manner.

About

Pyramid Self-Attention for Semantic Segmentation

License:MIT License