reger-men / tensorflow_benchmark

TensorFlow benchmark scripts for single and multi nodes with multi GPUs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tensorflow v2 benchmark

Supports TFv2

TensorFlow benchmark scripts for single and multi Nodes with Multi GPUs

Usage

Clone repo

git clone https://github.com/reger-men/tensorflow_benchmark.git

Pre-requirement

pip3 install -r requirements.txt

Single Node Single GPU

Train with custom loop:

python3 train.py -train_mode loop

Train with Keras fit:

python3 train.py -train_mode fit

Single Node Multi-GPUs

Mirrored strategy will be used as default with num_gpus>1

python3 train.py -train_mode fit -num_gpus 2

python3 train.py -train_mode loop -num_gpus 2

Multi-Node Multi-GPUs

Experimental: launch Multi-Nodes training from the chief Worker Node

python3 train.py --train_mode=fit --workers="localhost:122,localhost:123" --w_type="worker" --w_index=0 --distribution_strategy=MultiWorker

Or with custom loop:

python3 train.py --train_mode=loop --workers="localhost:122,localhost:123" --w_type="worker" --w_index=0 --distribution_strategy=MultiWorker

Help Flags

python3 train.py --helpfull

train.py:
  --batch_size: Batch Size
    (default: '128')
    (an integer)
  --buffer_size: Shuffle buffer size
    (default: '50000')
    (an integer)
  --display_every: Number of steps after which progress is printed out
    (default: '20')
    (an integer)
  --distribution_strategy: Can be: Mirrored, MultiWorker, OneDevice
    (default: 'OneDevice')
  --epochs: Number of epochs
    (default: '1')
    (an integer)
  --num_gpus: Number of GPUs. 0 will run on CPU
    (default: '1')
    (an integer)
  --[no]setup_cluster: Setup the cluster from the chief worker or not. This is an expiremental feature
    (default: 'true')
  --train_mode: Use either keras fit or loop training
    (default: 'fit')
  --verbose: Set verbosity level
    (default: '0')
    (an integer)
  --w_index: Worker index. 0 is appointed as the chief worker
    (default: '0')
    (an integer)
  --w_type: Task type
    (default: 'worker')
  --workers: List of workers IP:Port
    (default: 'localhost:122,localhost:123')

About

TensorFlow benchmark scripts for single and multi nodes with multi GPUs

License:MIT License


Languages

Language:Python 100.0%