qcj1206 / unfoldNd

(N=1,2,3)-dimensional unfold (im2col) and fold (col2im) in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

unfoldNd: N-dimensional unfold in PyTorch

https://coveralls.io/repos/github/f-dangel/unfoldNd/badge.svg?branch=main https://img.shields.io/badge/python-3.6+-blue.svg

This package uses a numerical trick to perform the operations of torch.nn.functional.unfold and torch.nn.Unfold , also known as im2col. It extends them to higher-dimensional inputs that are currently not supported.

From the PyTorch docs:

Currently, only 4-D input tensors (batched image-like tensors) are supported.

unfoldNd implements the operation for 3d and 5d inputs and shows good performance.

News:

  • [2021-05-02 Sun]: unfoldNd now also generalizes the fold operation (col2im) to 3d/4d/5d inputs

Installation

pip install --user unfoldNd

Usage

This package offers the following main functionality:

unfoldNd.unfoldNd
Like torch.nn.functional.unfold , but supports 3d, 4d, and 5d inputs.
unfoldNd.UnfoldNd
Like torch.nn.Unfold , but supports 3d, 4d, and 5d inputs.

Additional functionality (exotic)

Turned out the multi-dimensional generalization of torch.nn.functional.unfold can be used to generalize torch.nn.functional.fold ,

exposed through

unfoldNd.foldNd
Like torch.nn.functional.fold , but supports 3d, 4d, and 5d inputs.
unfoldNd.FoldNd
Like torch.nn.Fold , but supports 3d, 4d, and 5d inputs.

Keep in mind that, while tested, this feature is not benchmarked. However, sane performance can be expected, as it relies on N-dimensional unfold (benchmarked) and torch.scatter_add .

Performance

TL;DR: If you are willing to sacrifice a bit of RAM, you can get decent speedups with unfoldNd over torch.nn.Unfold in both the forward and backward operations.

There is a continuous benchmark comparing the forward pass (and forward-backward pass) run time and peak memory here. The settings are:

“example”
Configuration used in the example.
“allcnnc-conv{1,2,3,4,6,7,8}”
Convolution layers from the All-CNNC on CIFAR-100 with batch size 256, borrowed from DeepOBS. Layers 5 and 9 have been removed because they are identical to others in terms of input/output shapes and hyperparameters.

This is a reasonably large setting where one may want to compute the unfolded input, e.g. for the KFAC approximation.

Hardware details

The machine running the benchmark has 32GB of RAM with components

  • cpu: Intel® Core™ i7-8700K CPU @ 3.70GHz × 12
  • cuda: GeForce RTX 2080 Ti (11GB)

Results

  • Forward pass: unfoldNd is faster than torch.nn.Unfold in all, except one, benchmarks. The latest commit run time is compared here on GPU, and here on CPU.
  • Forward-backward pass: unfoldNd is faster than torch.nn.Unfold in all benchmarks. The latest commit run time is compared here on GPU, and here on CPU.
  • Higher peak memory: The one-hot convolution approach used by unfoldNd consistently reaches higher peak memory (see here). The difference to torch.nn.Unfold is higher than the one-hot kernel storage; probably the underlying convolution requires additional memory (not confirmed).

Background

Convolutions can be expressed as matrix-matrix multiplication between two objects; a matrix-view of the kernel and the unfolded input. The latter results from stacking all elements of the input that overlap with the kernel in one convolution step into a matrix. This perspective is sometimes helpful because it allows treating convolutions similar to linear layers.

The trick

Extracting the input elements that overlap with the kernel can be done by a one-hot kernel of the same dimension, and using group convolutions.

Applications

This is an incomplete list where the unfolded input may be useful:

Known issues

Encountered a problem? Open an issue here.

About

(N=1,2,3)-dimensional unfold (im2col) and fold (col2im) in PyTorch

License:MIT License


Languages

Language:Python 93.2%Language:Makefile 6.8%