improve directory management for training/nfold-training
lfoppiano opened this issue · comments
Luca Foppiano commented
This issue is to cover a revision of the directory management on the training or n-fold training.
Right now the training write directly under /data/models/model-name
. For training is not a problem because it's done at the end.
For nfold trainng it uses the data/models/xx/model-name
as temporary directory which could be left.
e.g.
-rw-r--r-- 1 lfoppian0 tdm 1.1K Feb 25 11:40 config.json
-rw-r--r-- 1 lfoppian0 tdm 425M Feb 25 12:11 model_weights0.hdf5
-rw-r--r-- 1 lfoppian0 tdm 425M Feb 17 13:55 model_weights1.hdf5
-rw-r--r-- 1 lfoppian0 tdm 425M Feb 17 15:00 model_weights2.hdf5
-rw-r--r-- 1 lfoppian0 tdm 425M Feb 17 15:34 model_weights3.hdf5
-rw-r--r-- 1 lfoppian0 tdm 425M Feb 17 16:19 model_weights4.hdf5
-rw-r--r-- 1 lfoppian0 tdm 425M Feb 17 17:07 model_weights5.hdf5
-rw-r--r-- 1 lfoppian0 tdm 425M Feb 17 17:41 model_weights6.hdf5
-rw-r--r-- 1 lfoppian0 tdm 425M Feb 17 18:12 model_weights7.hdf5
-rw-r--r-- 1 lfoppian0 tdm 425M Feb 17 18:47 model_weights8.hdf5
-rw-r--r-- 1 lfoppian0 tdm 425M Feb 17 19:45 model_weights9.hdf5
The fold 0 was overriden, then for any reason (e.g. the user stop the training, something crashes) the process stops, the directory is left half written.
We could use a temporary directory where the models are saved and then copied back in the data/model
directory once everthing finishes.