FlexFlow

FlexFlow is a deep learning framework that accelerates distributed DNN training by automatically searching for efficient parallelization strategies. FlexFlow provides a drop-in replacement for TensorFlow Keras and PyTorch. Running existing Keras and PyTorch programs in FlexFlow only requires a few lines of changes to the program.

Install FlexFlow

To install FlexFlow from source code, please read the instructions. If you would like to quickly try FlexFlow, we also provide prebuilt docker images with all dependencies pre-installed. You can also use conda to install the FlexFlow Python package (coming soon).

TensorFlow Keras Support

Users can use FlexFlow to accelerate the training procedure of existing TensorFlow Keras models by just changing the following import header lines.

from flexflow.keras.models import Model, Sequential
from flexflow.keras.layers import Input, Dense, Conv2D, ...
from flexflow.keras.callbacks import Callback, ...

FlexFlow uses a Python function called top_level_task() as the entry point of a program and automatically parallelize DNN training across all GPUs on all compute nodes. For example, the following code snippet shows parallelizing AlexNet training on the CIFAR10 dataset in FlexFlow.

def top_level_task():
  model = Sequential()
  model.add(Conv2D(filters=64, input_shape=(3,229,229), kernel_size=(11,11), strides=(4,4), padding=(2,2), activation="relu"))
  model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), padding="valid"))
  model.add(Conv2D(filters=192, kernel_size=(5,5), strides=(1,1), padding=(2,2), activation="relu"))
  ## More lines for model construction
  model.add(Activation("softmax"))
  ## Model compilation
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  ## Model training
  (x_train, y_train) = cifar10.load_data()
  model.fit(x_train, y_train, epochs=30)

if __name__ == "__main__":
  top_level_task()

During model compilation (i.e., model.compile in Keras), FlexFlow can autotune the parallelization performance by searching for efficient strategies on the given parallel machine. Next, model.fit performs DNN training on all available GPUs (potentially across multiple nodes) using the best discovered strategy. As a result, users don't need to manually design and optimize the device assignments.

More FlexFlow Keras examples: see the keras examples folder.

PyTorch Support

Users can also use FlexFlow to optimize the parallelization performance of existing PyTorch models in two steps. First, a PyTorch model can be exported to the FlexFlow model format using flexflow.torch.fx.torch_to_flexflow.

import torch
import flexflow.torch.fx as fx

model = MyPyTorchModule()
fx.torch_to_flexflow(model, "mymodel.ff")

Second, a FlexFlow program can directly import a previously saved PyTorch model and autotune the parallelization performance for a given parallel machine.

from flexflow.pytorch.model import PyTorchModel

def top_level_task():
  torch_model = PyTorchModel("mymodel.ff")
  output_tensor = torch_model.apply(ffmodel, input_tensor)
  ## Model compilation
  ffmodel.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  ## Model training
  (x_train, y_train) = cifar10.load_data()
  ffmodel.fit(x_train, y_train, epochs=30)

More FlexFlow PyTorch examples: see the pytorch examples folder.

ONNX Support

Similar to the PyTorch front-end, FlexFlow also supports training existing ONNX models by loading the models using flexflow.onnx.model.ONNXModel.

More FlexFlow ONNX examples: see the ONNX examples folder.

C++ Interface

For users that prefer to program in C/C++. FlexFlow supports a C++ program inference that is equivalent to its Python APIs.

More FlexFlow C++ examples: see the C++ examples folder.

Command-Line Flags

In addition to setting runtime configurations in a FlexFlow Python/C++ program, the FlexFlow runtime also accepts command-line arguments for various runtime parameters:

FlexFlow training flags:

-e or --epochs: number of total epochs to run (default: 1)
-b or --batch-size: global batch size in each iteration (default: 64)
-p or --print-freq: print frequency (default: 10)
-d or --dataset: path to the training dataset. If not set, synthetic data is used to conduct training.

Legion runtime flags:

-ll:gpu: number of GPU processors to use on each node (default: 0)
-ll:fsize: size of device memory on each GPU (in MB)
-ll:zsize: size of zero-copy memory (pinned DRAM with direct GPU access) on each node (in MB). This is used for prefecthing training images from disk.
-ll:cpu: number of data loading workers (default: 4)
-ll:util: number of utility threads to create per process (default: 1)
-ll:bgwork: number of background worker threads to create per process (default: 1)

Performance auto-tuning flags:

--search-budget or --budget: the number of iterations for the MCMC search (default: 0)
--search-alpha or --alpha: a hyper-parameter for the search procedure (default: 0.05)
--export-strategy or --export: path to export the best discovered strategy (default: None)
--import-strategy or --import: path to import a previous saved strategy (default: None)

For performance tuning related flags: see performance autotuning.

Contributing

Please let us know if you encounter any bugs or have any suggestions by submitting an issue.

We welcome all contributions to FlexFlow from bug fixes to new features and extensions.

Please subscribe to the FlexFlow users mailing list for

Citations

Zhihao Jia, Matei Zaharia, and Alex Aiken. Beyond Data and Model Parallelism for Deep Neural Networks. In Proceedings of the 2nd Conference on Machine Learning and Systems (MLSys), Palo Alto, CA, April 2019.
Zhihao Jia, Sina Lin, Charles R. Qi, and Alex Aiken. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.

The Team

FlexFlow is developed and maintained by teams at CMU, Facebook, Los Alamos National Lab, MIT, and Stanford (alphabetically).

License

FlexFlow uses Apache License 2.0.

stas00 / FlexFlow