securefederatedai / openfl

An open framework for Federated Learning.

Home Page:https://openfl.readthedocs.io/en/latest/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. 

KeertiX opened this issue · comments

Describe the bug
Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. 

To Reproduce
Run the tutorial openfl/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb 

Expected behavior
Tutorial should run successfully without any error. 

Screenshots
Creating AGGREGATOR certificate key pair with following settings: CN=ktalwarx-mobl.gar.corp.intel.com, SAN=DNS:ktalwarx-mobl.gar.corp.intel.com
Writing AGGREGATOR certificate key pair to: /home/keerti/aggregator based worflow/cert/server
The CSR Hash 60c9e4d7778ab8bc06444cc976cfb6c5b3ab1346f91c207593bdc6d7dedb102ae3ae80fd64978344afc597225d61bf85
The CSR Hash for file server/agg_ktalwarx-mobl.gar.corp.intel.com.csr = 60c9e4d7778ab8bc06444cc976cfb6c5b3ab1346f91c207593bdc6d7dedb102ae3ae80fd64978344afc597225d61bf85
Warning: manual check of certificate hashes is bypassed in silent mode.
Signing AGGREGATOR certificate
Traceback (most recent call last):
File "/home/keerti/aggregator based worflow/openfl/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.py", line 14, in
fx.init("torch_cnn_mnist", log_level="METRIC", log_file="./spam_metric.log")
File "/home/keerti/ls/envs/intelEnv/lib/python3.10/site-packages/openfl/native/native.py", line 203, in init
collaborator.create(
AttributeError: module 'openfl.interface.collaborator' has no attribute 'create'. Did you mean: 'create_'?

Desktop:

  • OS: WSL Ubuntu
  • Python Version 3.8
  • Openfl latest build

I can't seem to reproduce your issue. Can you provide some more information about your intelEnv environment. In particular, can you provide the output to python -m torch.utils.collect_env ?

Also, how did you install openfl? The error leads me to believe there may have been an issue with installation. Possible for you to try to just run:

import openfl.native as fx
fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log')

in a fresh environment?

Output to python -m torch.utils.collect_env is as follows:

(env-latest-original-openfl) parth-wsl@parthmax-mobl1:~/env-latest-original-openfl/openfl$ python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.16 (default, Mar  2 2023, 03:21:46)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==1.13.1
[pip3] torchvision==0.14.1
[conda] numpy                     1.24.3                   pypi_0    pypi
[conda] torch                     1.13.1                   pypi_0    pypi
[conda] torchvision               0.14.1                   pypi_0    pypi

fx.init function throws the error when called from any tutorial notebook.

When debugged openfl/native/native.py file calls collaborator.create function (openfl/interface/collaborator.py) in line#203, when I checked in openfl/interface/collaborator.py file, there is no create function found. But there is create_ function.

To reproduce the error fetch the latest code from the develop branch.

Thanks, this is reproducible on the latest build. We are working to fix this

I have this issue as of now March 2024, was there any solution. I've been googling for days

PR #835 is still open. You can installing from the kta-intel:fx-init fork directly, which has a fix, or you can try using the task runner CLI

Can you try installing from the fx-init branch?

git clone https://github.com/kta-intel/openfl.git
cd openfl
git checkout fx-init
pip install .

Glad we could resolve the issue!
Please feel free to reach out anytime. Always happy to help and answer any questions