Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working.

Question

Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working.

KeertiX opened this issue a year ago · comments

Keerti Prakash Talwar commented a year ago

Describe the bug
Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working.

To Reproduce
Run the tutorial openfl/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb

Expected behavior
Tutorial should run successfully without any error.

Screenshots
Creating AGGREGATOR certificate key pair with following settings: CN=ktalwarx-mobl.gar.corp.intel.com, SAN=DNS:ktalwarx-mobl.gar.corp.intel.com
Writing AGGREGATOR certificate key pair to: /home/keerti/aggregator based worflow/cert/server
The CSR Hash 60c9e4d7778ab8bc06444cc976cfb6c5b3ab1346f91c207593bdc6d7dedb102ae3ae80fd64978344afc597225d61bf85
The CSR Hash for file server/agg_ktalwarx-mobl.gar.corp.intel.com.csr = 60c9e4d7778ab8bc06444cc976cfb6c5b3ab1346f91c207593bdc6d7dedb102ae3ae80fd64978344afc597225d61bf85
Warning: manual check of certificate hashes is bypassed in silent mode.
Signing AGGREGATOR certificate
Traceback (most recent call last):
File "/home/keerti/aggregator based worflow/openfl/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.py", line 14, in
fx.init("torch_cnn_mnist", log_level="METRIC", log_file="./spam_metric.log")
File "/home/keerti/ls/envs/intelEnv/lib/python3.10/site-packages/openfl/native/native.py", line 203, in init
collaborator.create(
AttributeError: module 'openfl.interface.collaborator' has no attribute 'create'. Did you mean: 'create_'?

Desktop:

OS: WSL Ubuntu
Python Version 3.8
Openfl latest build

Kevin Ta · Answer 1 · Wed May 31 2023 06:05:29 GMT+0800 (China Standard Time)

I can't seem to reproduce your issue. Can you provide some more information about your intelEnv environment. In particular, can you provide the output to python -m torch.utils.collect_env ?

Also, how did you install openfl? The error leads me to believe there may have been an issue with installation. Possible for you to try to just run:

import openfl.native as fx
fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log')

in a fresh environment?

ParthM-GitHub · Answer 2 · Thu Jun 01 2023 20:31:49 GMT+0800 (China Standard Time)

Output to python -m torch.utils.collect_env is as follows:

(env-latest-original-openfl) parth-wsl@parthmax-mobl1:~/env-latest-original-openfl/openfl$ python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.16 (default, Mar  2 2023, 03:21:46)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==1.13.1
[pip3] torchvision==0.14.1
[conda] numpy                     1.24.3                   pypi_0    pypi
[conda] torch                     1.13.1                   pypi_0    pypi
[conda] torchvision               0.14.1                   pypi_0    pypi

fx.init function throws the error when called from any tutorial notebook.

When debugged openfl/native/native.py file calls collaborator.create function (openfl/interface/collaborator.py) in line#203, when I checked in openfl/interface/collaborator.py file, there is no create function found. But there is create_ function.

To reproduce the error fetch the latest code from the develop branch.

Kevin Ta · Answer 3 · Fri Jun 02 2023 00:34:43 GMT+0800 (China Standard Time)

Thanks, this is reproducible on the latest build. We are working to fix this

Mark McCawley · Answer 4 · Sat Mar 23 2024 05:05:45 GMT+0800 (China Standard Time)

I have this issue as of now March 2024, was there any solution. I've been googling for days

Kevin Ta · Answer 5 · Tue Mar 26 2024 04:06:31 GMT+0800 (China Standard Time)

PR #835 is still open. You can installing from the kta-intel:fx-init fork directly, which has a fix, or you can try using the task runner CLI

Mark McCawley · Answer 6 · Tue Mar 26 2024 04:10:11 GMT+0800 (China Standard Time)

Thanks for getting back to me Kevin, I appreciate the help. I’ll try the fork with the fix, or fallback to the task runner method as you suggest. Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 ***@***.*** From: Kevin Ta ***@***.***> Sent: Monday, March 25, 2024 1:07 PM To: securefederatedai/openfl ***@***.***> Cc: Mccawley, Mark A ***@***.***>; Comment ***@***.***> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834) PR #835<#835> is still open. You can installing from the kta-intel:fx-init<https://github.com/kta-intel/openfl/tree/fx-init> fork directly, which has a fix, or you can try using the task runner CLI<https://openfl.readthedocs.io/en/latest/about/features_index/taskrunner.html#bare-metal-approach> — Reply to this email directly, view it on GitHub<#834 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFRMLYUACILMMO4LMZDTQLTY2B7VZAVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJYHAYTONRRGM>. You are receiving this because you commented.Message ID: ***@***.******@***.***>>

Mark McCawley · Answer 7 · Thu Mar 28 2024 02:06:53 GMT+0800 (China Standard Time)

Kevin, thank you for the help earlier. I started a fresh Ubuntu install from scratch. But ran across the same issue. I installed from kta-intel/openfl at fx-init (github.com)<https://github.com/kta-intel/openfl/tree/fx-init> git clone and build. However, the error persists. Examples for Running a Federation Task Runner API: Federated PyTorch MNIST<https://openfl.readthedocs.io/en/latest/get_started/examples/taskrunner_pytorch_mnist.html#taskrunner-pytorch-mnist> The sample code: NOTE: All imports work without issue. 1. import numpy as np 2. import torch 3. import torch.nn as nn 4. import torch.nn.functional as F 5. import torch.optim as optim 6. 7. import torchvision 8. import torchvision.transforms as transforms 9. import openfl.native as fx 10. from openfl.federated import FederatedModel,FederatedDataSet 11. 12. #Setup default workspace, logging, etc. 13. fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log') The Error: 1. Traceback (most recent call last): 2. File "task_runner.py", line 15, in <module> 3. fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log') 4. File "/home/mark/projects/OpenFL/openfl/openfl/native/native.py", line 203, in init 5. collaborator.create( 6. AttributeError: module 'openfl.interface.collaborator' has no attribute 'create' Not sure if the information above is enough to know offhand why this is still occurring or not. This is simply following the intro instructions on the OpenFL official website. The same site customers use, I’, very concerned. Task Runner API: Federated PyTorch MNIST — OpenFL 2024.2 documentation<https://openfl.readthedocs.io/en/latest/get_started/examples/taskrunner_pytorch_mnist.html#taskrunner-pytorch-mnist> Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 ***@***.*** From: Kevin Ta ***@***.***> Sent: Monday, March 25, 2024 1:07 PM To: securefederatedai/openfl ***@***.***> Cc: Mccawley, Mark A ***@***.***>; Comment ***@***.***> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834) PR #835<#835> is still open. You can installing from the kta-intel:fx-init<https://github.com/kta-intel/openfl/tree/fx-init> fork directly, which has a fix, or you can try using the task runner CLI<https://openfl.readthedocs.io/en/latest/about/features_index/taskrunner.html#bare-metal-approach> — Reply to this email directly, view it on GitHub<#834 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFRMLYUACILMMO4LMZDTQLTY2B7VZAVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJYHAYTONRRGM>. You are receiving this because you commented.Message ID: ***@***.******@***.***>>

Kevin Ta · Answer 8 · Thu Mar 28 2024 04:51:37 GMT+0800 (China Standard Time)

Can you try installing from the fx-init branch?

git clone https://github.com/kta-intel/openfl.git
cd openfl
git checkout fx-init
pip install .

Mark McCawley · Answer 9 · Thu Mar 28 2024 05:02:41 GMT+0800 (China Standard Time)

Will do sir. Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 ***@***.*** From: Kevin Ta ***@***.***> Sent: Wednesday, March 27, 2024 1:52 PM To: securefederatedai/openfl ***@***.***> Cc: Mccawley, Mark A ***@***.***>; Comment ***@***.***> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834) Can you try installing from the fx-init branch? git clone https://github.com/kta-intel/openfl.git cd openfl git checkout fx-init pip install . — Reply to this email directly, view it on GitHub<#834 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFRMLYVZRSHYU27QFIY6OBTY2MWO7AVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRTHE3DEOJVGY>. You are receiving this because you commented.Message ID: ***@***.******@***.***>>

Mark McCawley · Answer 10 · Fri Mar 29 2024 00:04:44 GMT+0800 (China Standard Time)

Ah, very nice, that indeed fixed the issue. I know this is probably some simple stuff to you, but coming from firmware development I am not at all familiar with OpenFL feature. I wish I could pick your brain and understand what the issue was in code, and learn a bit about the OpenFL from an Intel expert, However, I don’t have the time sadly. Thank you for all the help. You have no idea how much I appreciate the prompt responses to my emails. And solutions you’ve provided. Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 ***@***.*** From: Kevin Ta ***@***.***> Sent: Wednesday, March 27, 2024 1:52 PM To: securefederatedai/openfl ***@***.***> Cc: Mccawley, Mark A ***@***.***>; Comment ***@***.***> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834) Can you try installing from the fx-init branch? git clone https://github.com/kta-intel/openfl.git cd openfl git checkout fx-init pip install . — Reply to this email directly, view it on GitHub<#834 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFRMLYVZRSHYU27QFIY6OBTY2MWO7AVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRTHE3DEOJVGY>. You are receiving this because you commented.Message ID: ***@***.******@***.***>>

Kevin Ta · Answer 11 · Sat Mar 30 2024 01:09:39 GMT+0800 (China Standard Time)

Glad we could resolve the issue!
Please feel free to reach out anytime. Always happy to help and answer any questions

Mark McCawley · Answer 12 · Sat Mar 30 2024 01:25:33 GMT+0800 (China Standard Time)

Indeed. I've been off and running since the solution yesterday, much code, and models being deployed without additional issues. Thank you again for the help. Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 [cid:fc3e803a-cc37-4ea0-a322-22e22a61452b]

…

________________________________ From: Kevin Ta ***@***.***> Sent: Friday, March 29, 2024 10:10 AM To: securefederatedai/openfl ***@***.***> Cc: Mccawley, Mark A ***@***.***>; Comment ***@***.***> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834) Glad we could resolve the issue! Please feel free to reach out anytime. Always happy to help and answer any questions — Reply to this email directly, view it on GitHub<#834 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFRMLYTB3YAYNIWQGSDK22TY2WN6RAVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRXGUYDKMZXGA>. You are receiving this because you commented.Message ID: ***@***.***>