levimcclenny/multimodal_transfer_learned_regression

DMTL-R - Official Implementation

Deep Multimodal Transfer-Learned Regression in Data-Poor Domains

Levi McClenny^{1, 2}, Mulugeta Haile², Vahid Attari³, Brian Sadler², Ulisses Braga-Neto¹, Raymundo Arroyave³

Paper: https://arxiv.org/abs/2006.09310

Abstract: In many real-world applications of deep learning, estimation of a target may rely on various types of input data modes, such as audio-video, image-text, etc. This task can be further complicated by a lack of sufficient data. Here we propose a Deep Multimodal Transfer-Learned Regressor (DMTL-R) for multimodal learning of image and feature data in a deep regression architecture effective at predicting target parameters in data-poor domains. Our model is capable of fine-tuning a given set of pre-trained CNN weights on a small amount of training image data, while simultaneously conditioning on feature information from a complimentary data mode during network training, yielding more accurate single-target or multi-target regression than can be achieved using the images or the features alone. We present results using phase-field simulation microstructure images with an accompanying set of physical features, using pre-trained weights from various well-known CNN architectures, which demonstrate the efficacy of the proposed multimodal approach.

_{_{¹Texas A&M Dept. of Electrical Engineering, College Station, TX}}
_{_{²US Army CCDC Army Research Lab, Aberdeen Proving Ground/Adelphi, MD}}
_{_{³Texas A&M Dept. of Materials Science, College Station, TX}}

Estimator Architecture

Multimodal refers to the fact that multiple types of input are supported and compliment each other to yield more accurate regression. The estimator takes advantage of transfer-learned weights from well-known and widely available CNN architectures. The entire estimator is trained via forward passes and subsequent backpropagation, as opposed to individually trained estimators which are aggregated or otherwise selected after training. The training flow is diagrammed here:

Requirements

Code was implemented in python 3.7 with the following package versions:

tensorflow version = 1.15.2
numpy version = 1.15.4
keras version = 2.2.4

and matplotlib 3.1.1 was used for visualization.

We created a short script to test for the versions on your local machine, you can run the command

python versiontest.py

and verify versions. Tensorflow v2.2 is also known to work properly. Other versions of numpy and tf.keras should still function properly, but will likely throw depreciation warnings.

Virtual Environment (Optional)

(Mac) To create a virtual environment to run this code, download the repository either via git clone or by clicking download at the top of github, then navigate to the top-level folder in a terminal window and execute the commands

python3 -m venv --system-site-packages ./venv
source ./venv/bin/activate

This will create a virtual environment named venv in that directory (first line) and drop you into it (second line). At that point you can install/uninstall package versions without effecting your overall environment. You can verify you're in the virtual environment if you see (venv) at the beginning of your terminal line. At this point you can install the exact versions of the packages listed here with the pip into the venv:

pip install tensorflow==1.15.2 numpy==1.15.4 keras==2.2.4

run

python versiontest.py

And you should see the following output:

Using TensorFlow backend
tensorflow version = 1.15.2
keras version = 2.2.4
numpy version = 1.15.4

Data

The data used in the paper is a very small subset of the data available at the open-source microstructure database. Data in the paper has been reduced to 2500 images of spinoidally decomposed microstructures, which have been preprocessed for each individual CNN variant. These preprocessed files are available as a .pkl file.

Data must be stored in the data/ directory of this repo for the DMTL-R estimator to find it. The files are large (~1.5 Gb for the ResNet/VGG16 networks and ~2.4 Gb for the Inception network), therefore it is only recommended that you download the set that you are interested in running.

ResNet Images
VGG16 Images
Inception Images

Regardless of the CNN architecture and subsequent images you download, you will need the properties.pkl and parameters.pkl in the data/ folder as well. So, if you want to train the DMTL-R with the fine-tuned ResNet architecture, you would have three files - ResNet_images.pkl,parameters.pkl, and parameters.pkl in your local data/ folder.

Usage

Once the data is in the data/ folder, then training DMTL-R can be performed by running the command

python train.py --resnet 20

in terminal window from the repo top level folder. Here, the flag --resnet can be replaced with --inception or --VGG16 if those models are desired. The integer 20 is the number of training epochs, and can be varied if the user would like to see results faster. The DMTL-R takes about 4 sec/epoch to train on a single V100 GPU and ~2.5 mins/epoch on a 2017 MacBook Pro with a 2.9 GHz Quad-Core Intel Core i7.

The DMTL-R results will automatically pop up in a quartz (or equivalent) window upon completion, similar to the one below:

Citation

Cite using the Bibtex citation below:

@article{mcclenny2020deep,
  title={Deep Multimodal Transfer-Learned Regression in Data-Poor Domains},
  author={McClenny, Levi and Haile, Mulugeta and Attari, Vahid and Sadler, Brian and Braga-Neto, Ulisses and Arroyave, Raymundo},
  journal={arXiv preprint arXiv:2006.09310},
  year={2020}
}

levimcclenny / multimodal_transfer_learned_regression

DMTL-R - Official Implementation

Deep Multimodal Transfer-Learned Regression in Data-Poor Domains

Levi McClenny^{1, 2}, Mulugeta Haile², Vahid Attari³, Brian Sadler², Ulisses Braga-Neto¹, Raymundo Arroyave³

Estimator Architecture

Requirements

Virtual Environment (Optional)

Data

Usage

Citation

About

Languages

DMTL-R - Official Implementation

Deep Multimodal Transfer-Learned Regression in Data-Poor Domains

Levi McClenny1, 2, Mulugeta Haile2, Vahid Attari3, Brian Sadler2, Ulisses Braga-Neto1, Raymundo Arroyave3

Estimator Architecture

Requirements

Virtual Environment (Optional)

Data

Usage

Citation

About

Languages

Levi McClenny^{1, 2}, Mulugeta Haile², Vahid Attari³, Brian Sadler², Ulisses Braga-Neto¹, Raymundo Arroyave³