A lightweight wrapper for automatically forking deep learning sessions to enable parallel model training across multiple GPUs.
- Python 3.x
- Matplotlib
- Pandas
- Numpy
furcate-tf additionally requires tensorflow>=2.0
git clone https://github.com/mattstruble/furcate.git
cd furcate && python setup.py install
Furcate allows the user to define a configuration json file which is automatically split up into every possible permutation based on keys with array values. Below showcases an example configuration file:
{
'data_name': "example",
'data_dir': "foo/",
'batch_size': [32, 64],
'epochs': 50
}
Furcate will process the above json configuration and split it up into two separate configurations, one where
batch_size=32
and one where batch_size=64
. Furcate will then query the host machine for the available CUDA GPUs and
create a queue of configurations for each GPU to query until all the permutations have been calculated.
Simply inherit from furcate.Fork
, implement the virtual methods, pass the config to your class and then call run.
import furcate
class App(furcate.Fork):
# override virtual Fork functions
if __name__=="__main__":
app = App("config.json")
app.run()
Furcate offers persistence between runs by saving configuration and run outputs to [log_dir]/run_data.csv
. If Furcate
detects this file on startup it will ignore any already existing runs when going through the configuration permuatations.
While running Furcate periodically checks the configuration file for any changes. If any changes are detected Furcate will automatically recalculate the permutations while making sure to exclude already completed calculations. This can be used to add, or remove, permutations without needing to restart Furcate.
Furcate will create a folder within the log directory per configuration, containing all of the thread's output,
and a run_data.csv
file which holds all the configurations and run data in csv format.
The generated folder names are based on the provided data_name
, data_dir
, and a shortened {key}{value}
grouping for any configuration key with multiple values. The final folder name has the following structure
[data_name]{thread_id}_[data_dir]_["_{shortened-key}{value}"..N]
Below outlines the structure after running the above example:
logs/
run_data.csv
|-- example0_foo_bs32/
| -- accuracy.jpg
| -- example0.err
| -- example0.log
| -- history.json
| -- model.h5
| -- run_data.json
|-- example1_foo_bs64/
| -- accuracy.jpg
| -- example1.err
| -- example1.log
| -- history.json
| -- model.h5
| -- run_data.json
- tensorflow
- Shows implementation of furcate-tf with custom dataset generation and custom configurations.
Subclassing furcate.Fork
will allow your app to inherit all the auto-forking capabilities and gives access to the forked configuration
data as class member variables. For full customization Fork provides virtual methods for each stage of the model training
pipeline each method is listed below by call-order.
-
get_available_gpu_indices()
- Returns a list of available GPU indices. Defaults to using
nvidia-smi
.
- Returns a list of available GPU indices. Defaults to using
-
set_visible_gpus()
- Restricts the CUDA visible devices on disk to the current processor's GPU id.
Defaults to setting OS Environment Variable
CUDA_VISIBLE_DEVICES
toself.gpu_id
.
- Restricts the CUDA visible devices on disk to the current processor's GPU id.
Defaults to setting OS Environment Variable
-
get_datasets(train_fp, test_fp, valid_fp)
- Return the computed datasets for model ingestion. The train_fp, test_fp, and valid_fp are arrays containing filepaths
based on the provided
data_dir
value combined with thetrain_prefix
,test_prefix
andvalid_prefix
configurations.
- Return the computed datasets for model ingestion. The train_fp, test_fp, and valid_fp are arrays containing filepaths
based on the provided
-
get_model()
- Return the built model for use in training and evaluation
-
get_metrics()
- Return a list of metrics for use in training evaluation.
-
get_optimizer()
- Return an optimizer for use in model compilation.
-
get_loss()
- Return the loss for use in model compilation.
-
get_callbacks()
- Return a list of training callbacks. Defaults to None.
-
model_compile(model, optimizer, loss, metrics)
- Passes in the built model, optmizer, loss, and metrics, to compile the model with prior to training.
-
model_summary(model)
- Intended to be used to display the model summary to log.
-
model_fit(model, train_dataset, epochs, valid_dataset, callbacks, verbose)
- Train the model with the results from the above methods.
-
model_evaluate(model, test_dataset)
- Called only if test_dataset exists. Returns the results of the model evaluation.
-
model_save(model)
- Intended to be used to save the fully trained model to disk.
-
save_metric(results_dict, history, metric)
- Intended to be used to save metric data to a dictionary which will then be saved to disk. It is passed a results dictionary that already contains run times, the history of the training session, and the current metric to save.
-
plot_metric(history, metric)
- Intended to be used to save a plot of each metric to the log directory. Extract the epochs, train_metric, and val_metric
from the history and then pass the information to
plot
to automatically save a matplotlib graph to the log dir.
- Intended to be used to save a plot of each metric to the log directory. Extract the epochs, train_metric, and val_metric
from the history and then pass the information to
Fork also includes optional virtual functions for data preprocessing:
-
preprocess(self, record)
-
train_preprocess(self, record)
-
test_preprocess(self, record)
-
valid_preprocess(self, record)
furcate-tf overrides the following Fork methods with common TensorFlow implementations:
- get_available_gpu_indices
- Utilizes
tf.config.list_physical_devices("GPU")
to get the indices of devices visible to TensorFlow.
- Utilizes
- set_visible_gpus
tf.config.set_visible_devices([GPU_ID], "GPU")
- model_compile
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
- model_summary
model.summary()
- model_fit
model.fit(train_set, epochs=epochs, validation_data=valid_set, callbacks=callbacks, verbose=verbose)
- model_evaluate
model.evaluate(test_set)
- model_save
model.save(os.path.join(self.log_dir, "model.h5"))
- save_metric
- Gets train and validation values from
history.history
and stores the final epoch values in the dictionary.
- Gets train and validation values from
- plot_metric
- Gets train and validation values from
history.history
and plots them individually against epochs.
- Gets train and validation values from
furcate-tf also comes with additional helper methods:
- gen_tfrecord_dataset(self, filepaths, processor, shuffle=False)
- Creates a
TFRecordDataset
from the provided filepaths, mapped to the provided processor, based on the config values forcache
,shuffle_buffer_size
,batch_size
,seed
,prefetch
,num_parallel_reads
, andnum_parallel_calls
.
- Creates a
num_parallel_reads
and num_parallel_calls
can be configured like every other data value,
but if unset furcate-tf defaults them to tf.data.experimental.AUTOTUNE
.
Furcate reads in configuration information from any standard json file. This allows the user to define any configuration variables and then have them be accessible directly as Fork member variables at runtime.
Below showcases the configuration with the required configuration keys, and any reserved keys with default values: 'data_name', 'data_dir', 'batch_size', 'epochs'
{
"data_name": # String. Required. Single-word shortname describing the data being trained on.
"data_dir": # String. Required. Base directory where the data lives.
"batch_size": # Int. Required. Batch size for datasets.
"epochs": # Int. Required. Number of epochs to train on.
"framework": # String. Required. Deep learning framework to use, options: ['tf'].
"log_dir": # String. Defaults to: "logs/"
"train_prefix": # String. Defaults to: "[data_name].train"
"valid_prefix": # String. Defaults to: "[data_name].valid"
"test_prefix": # String. Defaults to: "[data_name].test"
"learning_rate": # Float. Defaults to: 0.001
"verbose": # Int. Defaults to: 2
"cache": # Bool. Defaults to: False
"seed": # Int. Defaults to: 42
"prefetch": # Int. Defaults to: 1
"meta": {
"allow_cpu": # Bool. Defaults to: False
"max_threads": # Int. Defaults to: Number of CUDA GPUs
"exclude_configs": # Array. Defaults to: []
"mem_trace": { # Dictionary. Defaults to below configuration.
"enabled": # Boolean. Defaults to false.
"delay": # Int. Number of seconds between running memory trace. Defaults to: 300
"top": # Int. Number of top memory allocations to list. Defaults to: 10
"trace": # Int. Number of of allocations to display the stack trace for. Defaults to: 1
}
}
}
All configurations can be overwritten, and any additional configurations can be added.
exclude_configs
expects an array of dictionary key-value pairs and excludes any config containing those pairs from
the generated configs. In the below example the final generated configs won't have any configuration in which both
optimizer_name=Adadelta
and batch_size=32
, or optimizer_name=SGD
and learning_rate=0.001
.
{
'data_name': "example",
'data_dir': "foo/",
'batch_size': [32, 64],
'epochs': 50
"learning_rate": [0.0001, 0.001, 0.00001],
"optimizer_name": ["RMSProp", "SGD", "Adam", "Nadam", "Adadelta"],
"meta": {
"exclude_configs": [
{ "optimizer_name": "Adadelta", "batch_size": 32 },
{ "optimizer_name": "SGD", "learning_rate": 0.001 }
]
}
}
Every value in an exclude_config
dictionary needs to be matched in order for a generated config to be excluded.
Therefore, a dictionary pairing will need to be created per unique combination to exclude.
mem_trace
controls the built-in memory trace configuration, which periodically outputs memory usage statistics to the furcate.log file. It tracks the CUDA GPU stats, top memory changes since starting Furcate, top incremental changes, and the current top memory allocations. Below shows an example output of the top 3 memory allocations from a single snapshot while running a training environment:
2020-12-02 16:04:33.000907: furcate.runner] GPU Stats
2020-12-02 16:04:33.000907: furcate.runner] gpu_stats id=0, name=TITAN RTX, mem_used=19278 MiB, mem_total=24220 MiB, mem_util=79 %, volatile_gpu=46 %, temp=60 C
2020-12-02 16:04:33.000908: furcate.runner] gpu_stats id=1, name=TITAN RTX, mem_used=16564 MiB, mem_total=24220 MiB, mem_util=68 %, volatile_gpu=33 %, temp=54 C
2020-12-02 16:04:33.000908: furcate.runner] Top Diffs since Start
2020-12-02 16:04:33.000909: furcate.runner] top_diffs i=1, stat=XXX/python3.8/linecache.py:0: size=316 KiB (+316 KiB), count=3173 (+3173), average=102 B
2020-12-02 16:04:33.000910: furcate.runner] top_diffs i=2, stat=XXX/python3.8/tracemalloc.py:0: size=73.6 KiB (+73.6 KiB), count=1084 (+1084), average=70 B
2020-12-02 16:04:33.000910: furcate.runner] top_diffs i=3, stat=XXX/python3.8/abc.py:0: size=25.2 KiB (+25.2 KiB), count=144 (+144), average=179 B
2020-12-02 16:04:33.000910: furcate.runner] Top Incremental
2020-12-02 16:04:33.000910: furcate.runner] top_incremental i=1, stat=XXX/python3.8/linecache.py:137: size=315 KiB (+101 KiB), count=3156 (+1018), average=102 B
2020-12-02 16:04:33.000910: furcate.runner] top_incremental i=2, stat=XXX/python3.8/tracemalloc.py:185: size=25.5 KiB (+16.6 KiB), count=435 (+280), average=60 B
2020-12-02 16:04:33.000911: furcate.runner] top_incremental i=3, stat=XXX/python3.8/tracemalloc.py:532: size=12.9 KiB (+12.9 KiB), count=195 (+195), average=68 B
2020-12-02 16:04:33.000911: furcate.runner] Top Current
2020-12-02 16:04:33.000924: furcate.runner] top_current i=1, stat=XXX/python3.8/linecache.py:0: size=316 KiB, count=3173, average=102 B
2020-12-02 16:04:33.000924: furcate.runner] top_current i=2, stat=XXX/python3.8/tracemalloc.py:0: size=73.6 KiB, count=1084, average=70 B
2020-12-02 16:04:33.000924: furcate.runner] top_current i=3, stat=XXX/python3.8/abc.py:0: size=25.2 KiB, count=144, average=179 B
2020-12-02 16:04:33.000936: furcate.runner] traceback memory_blocks=2138, size_kB=213
2020-12-02 16:04:33.000936: furcate.runner] File "dlpa_model_research/dlpa_model.py", line 191
2020-12-02 16:04:33.000936: furcate.runner] app.run()
2020-12-02 16:04:33.000936: furcate.runner] File "XXX/python3.8/site-packages/furcate-0.1.0.dev1-py3.8.egg/furcate/fork.py", line 87
2020-12-02 16:04:33.000936: furcate.runner] runner.run(self.script_name)
2020-12-02 16:04:33.000936: furcate.runner] File "XXX/python3.8/site-packages/furcate-0.1.0.dev1-py3.8.egg/furcate/runner.py", line 358
2020-12-02 16:04:33.000936: furcate.runner] mem_trace.snapshot("Started thread {}: [{}]".format(thread_id, str(config)))
2020-12-02 16:04:33.000936: furcate.runner] File "XXX/python3.8/site-packages/furcate-0.1.0.dev1-py3.8.egg/furcate/runner.py", line 134
2020-12-02 16:04:33.000936: furcate.runner] for line in stat.traceback.format():
2020-12-02 16:04:33.000936: furcate.runner] File "XXX/python3.8/tracemalloc.py", line 229
2020-12-02 16:04:33.000936: furcate.runner] line = linecache.getline(frame.filename, frame.lineno).strip()
2020-12-02 16:04:33.000936: furcate.runner] File "XXX/python3.8/linecache.py", line 16
2020-12-02 16:04:33.000936: furcate.runner] lines = getlines(filename, module_globals)
2020-12-02 16:04:33.000937: furcate.runner] File "XXX/python3.8/linecache.py", line 47
2020-12-02 16:04:33.000937: furcate.runner] return updatecache(filename, module_globals)
2020-12-02 16:04:33.000937: furcate.runner] File "XXX/python3.8/linecache.py", line 137
2020-12-02 16:04:33.000937: furcate.runner] lines = fp.readlines()
- Hyperparameter auto-tuning
- PyTorch implementation