基于机器学习的遥感图像描述即可视化

Prerequisites

PyTorch, torchvision, Pillow, tqdm and NLTK packages should be installed. The code is tested with Python 3.7.4, PyTorch 1.4.0, torchvision 0.5.0, Pillow 6.1.0, tqdm 4.36.1, NLTK 3.5 and Ubuntu 18.04.3.
One of the RSICD, Sydney-Captions and UCM-Captions datasets should be downloaded.
The pre-trained summarization model should be available. To train the pointer-generator model (defined here) for summarization, you can use this GitHub repository.
Both the training and evaluation scripts except json files for the configuration arguments. Below, the arguments associated with training and evaluation are written, respectively.

exp_name: Name of the experiment used while saving the model checkpoints.
images_dir: Image directory of a dataset.
captions_file: JSON file of a dataset that includes the captions of images.
vocabulary_path: TXT file that includes vocabulary words. The vocabulary used during our experiments is provided in vocab.txt file.
vocabulary_size: Number of words used in the vocaulary.
emb_dim: Image embedding size.
decoder_dim: Decoder dimension for the first step of the SD-RSIC (Generation of Standard Captions).
embed_model: CNN model name used in the SD-RSIC: resnet34, resnet50, resnet101, resnet152, vgg16, vgg19, inceptionv1, inceptionv3, densenet121, densenet169 or densenet201.
epochs: The number of epochs for the training.
batch_size: Batch size used during training.
decoder_lr: The initial learning rate of the training.
print_freq: Print frequency for the training.
checkpoints_path: The directory path where the model checkpoints will be saved.
results_file: The JSON file path where the information of training procedure will be saved.
summarization_model_path: The pre-trained model path of the summarization model.

exp_name: Name of the experiment used while loading the model checkpoint.
images_dir: Image directory of a dataset.
captions_file: JSON file of a dataset that includes the captions of images.
vocabulary_path: TXT file that includes vocabulary words. The vocabulary used during our experiments is provided in vocab.txt file.
vocabulary_size: Number of words used in the vocaulary.
emb_dim: Image embedding size.
decoder_dim: Decoder dimension for the first step of the SD-RSIC (Generation of Standard Captions).
embed_model: CNN model name used in the SD-RSIC: resnet34, resnet50, resnet101, resnet152, vgg16, vgg19, inceptionv1, inceptionv3, densenet121, densenet169 or densenet201.
batch_size: Batch size used during evaluation.
checkpoints_path: The directory path where the model checkpoint will be searched.
results_file: The JSON file path where the resulting image captions will be saved. The suffix of '_test' will be added to the filename.

参考另一个readme