GewoonMaarten / infompr-project

INFOMPR Project 👾

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

INFOMPR Project

Running the traing script

> python .\run_local.py --help        
usage: run_local.py [-h] --trainer {dual,image,text}

optional arguments:
  -h, --help                  show this help message and exit
  --trainer {dual,image,text} which model should be trained

Links

Dataset:

Config file structure

The config file is used to store the paths to the datasets. It needs to be called config.json and needs to be stored in the root of the project. It needs to have the following structure (keys with the _ prefix are optional):

{
  "public_dataset": 
  {
    "multimodal_test": "path_to_multimodal_test_public.tsv",
    "multimodal_train": "path_to_multimodal_train.tsv",
    "multimodal_validate": "path_to_multimodal_validate.tsv",
    "images_dir": "path_to_public_image_set_dir"
  },
  "epochs": 0,
  "batch_size": 0,
  "text_config": {
    "max_length": 0,
    "use_bert": true|false
  },
  "img_config": {
    "img_width": 0,
    "img_height": 0
  },
  "_teams_webhook_url": "webhook_url"
}

This file is then loaded in utils.config.py to be made available for the whole project.

These are the settings used in the final models:

{
  "epochs": 10,
  "batch_size": 10,
  "text_config": {
    "max_length": 128
  },
  "img_config": {
    "img_width": 380,
    "img_height": 380
  }
}

Scripts

Below are some scripts that can help you with training of the models. They need to be executed from the root directory of the project.

  • scripts.check_images.py can be used to check if all images exist and can be loaded.

    > python -m scripts.check_images
    usage: check_images.py [-h] [-t]
    
    optional arguments:
      -h, --help  show this help message and exit
      -t          checks the images thoroughly by loading them
    
  • scripts.create_mini_dataset.py can be used to create a mini dataset from the larger one.

    > python -m scripts.create_mini_dataset
    usage: create_mini_dataset.py [-h] -n SAMPLES [-d {train,test,validate}]
    
    optional arguments:
      -h, --help               show this help message and exit
      -n SAMPLES               number of samples for the mini dataset
      -d {train,test,validate} which dataset should be used to create the minidataset
    
  • scripts.fix_dataset.py can be used to remove invalid images from the dataset.

    > python -m scripts.fix_dataset
    

About

INFOMPR Project 👾

License:MIT License


Languages

Language:Jupyter Notebook 90.2%Language:Python 9.8%