alexnet_parallel_oneflow

A distributed parallel implementation of AlexNet, including DP, TP and PP. The dataset uses a small CIFAR10.

Download dataset

training_data = flowvision.datasets.CIFAR10(
    root="data",
    train=True,
    transform=transforms.ToTensor(),
    download=False,
)

If you want to test with a larger dataset, the OFRecord code for the ImageNet dataset is also provided. See: alexnet_1d_ofrecord.

BATCH_SIZE = 128

NOTE: Among them, the tensor parallel only splits the Linear layer. See the code for details on the tensor parallel.

BATCH_SIZE = 1

BATCH_SIZE = 1024

A distributed parallel implementation of AlexNet, including DP, TP and PP.

MIT License

Language:Python 100.0%