Deep convolutional tensor network

What is this?

This is the code I used to perform experiments in Deep convolutional tensor network (arxiv:2005.14506).

How to run an experiment

The main entry point is ./new_runner.py. To see help about its arguments, run

PYTHONPATH=. python new_runner.py --help

or read all the decorators of main function in ./new_runner.py.

An example of how to run training is

$ PYTHONPATH=. python new_runner.py \
  --ds-path /path/to/downloaded/fashionmnist \
  --ds-type fashionmnist \
  --experiments-dir /path/to/where/experiments/info/will/be/saved \
  --epses-specs '(4,4),(3,6)' \
  --batch-size 128 \
  --optimizer adam \
  --reg-type epses_composition \
  --reg-coeff 1e-2 \
  --init-epses-composition-unit-empirical-output-std \
  --lr 1.11e-4

The flag --init-epses-composition-unit-empirical-output-std turns on “empirical unit std of intermediate representations initialization”, as it’s called in the article. You can pass the flag --init-epses-composition-unit-theoretical-output-std instead to use He initialization.

The code base is full of old code which I don’t use anymore. Unfortunately, I don’t have time to clean it up for public use. So I suggest you look at ./new_runner.py and at code it uses, but ignore code it doesn’t use.

How to run tests in parallel

$ conda install pytest-xdist # from conda-forge
$ cd ~/projects/dctn
$ python -m pytest --numprocesses=4 tests/

Various notes and plots

Another thing you, dear reader, might be interested in are various plots and small notes exploring how hyperparameters affect everything. They are located in ./small_experiments/plots. Plots are in HTML files generated by Bokeh. You can’t view them on github, you need to download them. Whenever a directory there contains files, the names of which start with 01, 02, etc., you should probably look at them in that order. Sometimes the filenames tell you what this is about. Sometimes the HTML files containing the plots contain descriptions of what the plots show. Also, in HTML files you can see parameters passed to ./new_runner.py, which contain hyperparameters. All this is very raw and probably not very readable, sorry about that. I made them primarily for myself.

Notes on DCTN and CIFAR10 (from <2020-06-06>)

Recently I’ve been trying DCTN on CIFAR10. But it overfits really bad. The observations listed below strongly suggest that DCTN is bad for CIFAR10 unless I think of some new tricks.

Baseline: linear classifier

A linear multinomial classifier gets 41.73% validation accuracy and 45.474% train accuracy (I did grid search using sklearn).

DCTN using YCbCr

I interpret the three channels YCbCr as the quantum dimension, normalize them (μ=0,σ=1) and add a constant channel of ones. I get:

EPS(K=3,Q=6)+linear - 43.3% val accuracy (gridsearch of lr and regularization coefficient). This model can achieve at least 60% train accuracy, don’t know about more than this, because I stopped training. Best result was with very small regularization coefficient λ=1e-12. Having it between 1e-3 and 1e-4 made the model overfit more, which is surprising. Also, for some reason, high learning rate (≥ 3e-4), which led to unstable training, increased overfitting.
EPS(K=2,Q=24)+linear - best val acc (lr grid search) is 50.98% with lr=3.16e-4.
EPS(K=2,Q=12)+linear - best val acc (lr grid search) is 49.4%. Best lrs are 1e-3 and 3.16e-4. lr=3.16e-3 had unstable training and (surprisingly) overfitted a lot.
EPS(K=2,Q=6)+linear - best val acc is 48.3% with lr=1e-3.

So, kernel size K=2 is better than K=3, probably because less parameters, hence less overfitting. Also, with K=2, larger quantum dimension Q is better than small.

See notes and plots in ./small_experiments/plots/10_cifar10_ycbcr_const_channel_zeromeanscaling_one_eps_K=3/notes.org and ./small_experiments/plots/11_cifar10_ycbcr_one_eps_K=2_gridsearch/01_notes.org.

DCTN using grayscale CIFAR10

EPS(K=4,Q=4)+linear gets 49.5% val accuracy, I downscale CIFAR10 to 28x28. I use initialization and multiplier ν (used in the preprocessing function φ) analagous to my best result on FashionMNIST.
EPS(K=4,Q=4)+EPS(K=3,Q=6)+linear gets 54.8% val accuracy. Here I use 32x32. I use initialization analagous to my best result on FashionMNIST but choose ν a little smaller. Here at least 98.4% train accuracy can be achieved.

See plots in ./small_experiments/plots/08_cifar10/ and ./small_experiments/plots/09_cifar10_28vs32/.

philip-bl / dctn