No train.sh file?
CherFen opened this issue · comments
CherFen commented
No weight file was found in the file train.sh file
Georgy commented
Hi, apologies, this was my initial commit, I will do some refactoring tonight. train.sh
is required for GCP training only.
Since train.sh
points to many personal directories, I decided not to supply it, but the code is as follows:
#!/usr/bin/env bash
usage () {
echo "usage: train.sh [local | remote ]
Use 'local' to train locally with a local data file, and 'train' to
run on ML Engine. For ML Engine jobs the train and valid directories must reside on GCS.
Examples:
# train locally
./train.sh local
# train on ML Engine with hparms.py
./train.sh remote
# tune hyperparameters on ML Engine
./train.sh tune
"
}
date
TIME=`date +"%Y%m%d_%H%M%S"`
# BUCKET
BUCKET=gs://BUCKET
DATAPATH=gs://BUCEKT/data
WEIGHTS=gs://BUCKET/jobs/mst_training_remote_20190524_103506/weights/decoder.h5
LOCAL_WEIGHTS=./trainer/data/weights/weights.h5
if [[ $# < 1 ]]; then
usage
exit 1
fi
# set job vars
JOB_TYPE="$1"
JOB_NAME=mst_training_${JOB_TYPE}_${TIME}
export JOB_NAME=${JOB_NAME}
REGION=europe-west1
if [[ ${JOB_TYPE} == "local" ]]; then
gcloud ml-engine local train \
--module-name trainer.train \
--package-path ./trainer \
-- \
--datapath trainer/data \
--job-dir trainer/jobs/${JOB_NAME}/ \
--weights ${LOCAL_WEIGHTS} \
elif [[ ${JOB_TYPE} == "remote" ]]; then
gcloud ml-engine jobs submit training ${JOB_NAME} \
--region ${REGION} \
--job-dir ${BUCKET}/jobs/${JOB_NAME}/ \
--module-name trainer.train \
--package-path ./trainer \
--config trainer/config/config_train.json \
-- \
--datapath ${DATAPATH} \
# --weights ${WEIGHTS} \
else
usage
fi
Georgy commented
I added train.sh
. I will do some refactoring tonight, and add further comments to code.