- Author: Fantine Huot
After cloning the repository, make sure to run the following commands to initialize and update the submodules.
git submodule init
git submodule update
- TensorFlow
- bin: Scripts to run jobs.
- config: Configuration files.
- log: Log files.
- trainer: Machine learning model trainer.
This repository provides a parameterized, modular framework for creating and running ML jobs.
To train a machine learning model, use the following command:
bin/train.sh model_config dataset
model_config
: Name of ML model configuration to use. This should correspond to a configuration file namedconfig/model_config.sh
.dataset
: Dataset identifier. Check the variablesdatapath
,train_file
, andeval_file
inbin/train.sh
to ensure that this maps to the correct input data.label
: Optional label to add to the job name.
Parameters for an ML job can be set by creating a corresponding configuration
file: config/your_model_config.sh
.
- Create a new
your_model.py
file inside thetrainer/model
folder. Look at other models inside the folder for examples. - Reference your new model in
trainer/model/__init__.py
. - Set the
model
argument to your new model's name in your model configuration fileconfig/your_model_config.sh
.
The hyperparameters are tuned using bayesian optimization.
To tune the hyperparameters for a machine learning model, use the following command:
bin/tunehp.sh model_config dataset
model_config
: Name of ML model configuration to use. This should correspond to a configuration file namedconfig/model_config.sh
.dataset
: Dataset identifier. Check the variablesdatapath
,train_file
, andeval_file
inbin/train.sh
to ensure that this maps to the correct input data.
You can define the domain to explore for hyperparameter tuning by creating a
corresponding configuration file: config/your_model_config_hptuning.yaml
.
Look at other hyperparameter tuning configuration files for examples.