broken codebase

Question

broken codebase

mlewis1973 opened this issue 8 months ago · comments

mlewis1973 commented 8 months ago

package installs without complaints on MacOS Catalina (conda, python 3.8.5, CPU only torch). inference.py drops segmentation fault. Extensive code walking finds the issue is 'from catalyst import dl'. There is an unknown incompatibility between catalyst and one or more other packages since 'import catalyst.dl' works fine in clean environment. Conflict is not with pytorch. tensorboard has some issue with catalyst (but no segmentation fault). Frankly, do tensorboard, keras, jupyter, etc all really need to be installed for inference? Detritus from training needs to be removed.

On 2 linux platforms (Ubuntu & Centos), package installs without complaint. Current requirements.txt does not have +cuXXX on torch versions so you get CPU-only libraries. No issue with catalyst.dl, but after model downloads, there is reproducible error loading torch module re: pickle serialization

File "/home/mlewis/miniconda3/envs/adpkd_gpu/lib/python3.8/site-packages/torch/serialization.py", line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

Torch device seems to be CPU only even in a environment w GPU support:
File "/home/mlewis/work/Gears/ADPKD-segmentation/adpkd-segmentation-pytorch-GPU/adpkd_segmentation/utils/train_utils.py", line 38, in load_model_data
checkpoint = torch.load(path, map_location=torch.device('cpu'))
(adpkd_gpu) [mlewis@acr-ailab adpkd-segmentation-pytorch-GPU]$ python -c 'import torch;print(torch.cuda.is_available())'
True

mlewis1973 · Answer 1 · Fri Nov 03 2023 01:49:58 GMT+0800 (China Standard Time)

Here is the barebones requirements.txt that will get MacOS to the same error as Linux:
albumentations==1.0.3
catalyst==20.8.2
nibabel==3.2.1
#nvidia-cublas-cu11==11.10.3.66
#nvidia-cuda-nvrtc-cu11==11.7.99
#nvidia-cuda-runtime-cu11==11.7.99
#nvidia-cudnn-cu11==8.5.0.96
opencv-python-headless==4.5.3.56
pydicom==2.3.0
seaborn==0.11.1
segmentation-models-pytorch~=0.2.0
SimpleITK==2.0.2
torch==1.13.1
torchvision==0.14.1

(adpkd) mlewis@bfei-imac275k adpkd-segmentation-pytorch % python adpkd_segmentation/inference/inference.py
Enter run inference...
Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b5_ap-9e82fae8.pth" to /Users/mlewis/.cache/torch/hub/checkpoints/tf_efficientnet_b5_ap-9e82fae8.pth
100%|██████████████████████████████████████████████████████████████████████████████████| 117M/117M [00:01<00:00, 115MB/s]
loading checkpoint checkpoints/best_val_checkpoint.pth
Traceback (most recent call last):
File "adpkd_segmentation/inference/inference.py", line 117, in
run_inference(
File "adpkd_segmentation/inference/inference.py", line 47, in run_inference
model_args = load_config(
File "/Users/mlewis/flywheel/Gears/AI/ADPKD-segmentation/adpkd-segmentation-pytorch/adpkd_segmentation/inference/inference_utils.py", line 80, in load_config
load_model_data(saved_checkpoint, model, new_format=checkpoint_format)
File "/Users/mlewis/flywheel/Gears/AI/ADPKD-segmentation/adpkd-segmentation-pytorch/adpkd_segmentation/utils/train_utils.py", line 38, in load_model_data
checkpoint = torch.load(path, map_location=torch.device('cpu'))
File "/Users/mlewis/opt/miniconda3/envs/adpkd/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/Users/mlewis/opt/miniconda3/envs/adpkd/lib/python3.8/site-packages/torch/serialization.py", line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

mlewis1973 · Answer 2 · Fri Nov 03 2023 07:03:08 GMT+0800 (China Standard Time)

It's also not clear why it's downloading the tensorflow efficientnet as opposed to the torch version....
here's 'pip list' after install with my distilled requirements.txt above:
(adpkd) mlewis@bfei-imac275k adpkd-segmentation-pytorch % pip list
Package Version Editable project location

absl-py 2.0.0
adpkd-segmentation 1.0 /Users/mlewis/flywheel/Gears/AI/ADPKD-segmentation/adpkd-segmentation-pytorch
albumentations 1.0.3
appnope 0.1.3
asttokens 2.4.1
backcall 0.2.0
cachetools 5.3.2
catalyst 20.8.2
certifi 2023.7.22
charset-normalizer 3.3.2
contourpy 1.1.1
cycler 0.12.1
decorator 5.1.1
deprecation 2.1.0
efficientnet-pytorch 0.6.3
executing 2.0.1
fonttools 4.43.1
gitdb 4.0.11
GitPython 3.1.40
google-auth 2.23.4
google-auth-oauthlib 1.0.0
grpcio 1.59.2
idna 3.4
imageio 2.31.6
importlib-metadata 6.8.0
importlib-resources 6.1.0
ipython 8.12.3
jedi 0.19.1
joblib 1.3.2
kiwisolver 1.4.5
lazy_loader 0.3
Markdown 3.5.1
MarkupSafe 2.1.3
matplotlib 3.7.3
matplotlib-inline 0.1.6
munch 4.0.0
networkx 3.1
nibabel 3.2.1
numpy 1.24.4
oauthlib 3.2.2
opencv-python-headless 4.5.3.56
packaging 23.2
pandas 2.0.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 10.0.1
pip 23.3
plotly 5.18.0
pretrainedmodels 0.7.4
prompt-toolkit 3.0.39
protobuf 4.25.0
ptyprocess 0.7.0
pure-eval 0.2.2
pyasn1 0.5.0
pyasn1-modules 0.3.0
pydicom 2.3.0
Pygments 2.16.1
pyparsing 3.1.1
python-dateutil 2.8.2
pytz 2023.3.post1
PyWavelets 1.4.1
PyYAML 6.0.1
requests 2.31.0
requests-oauthlib 1.3.1
rsa 4.9
scikit-image 0.21.0
scikit-learn 1.3.2
scipy 1.10.1
seaborn 0.11.1
segmentation-models-pytorch 0.2.0
setuptools 68.0.0
SimpleITK 2.0.2
six 1.16.0
smmap 5.0.1
stack-data 0.6.3
tenacity 8.2.3
tensorboard 2.14.0
tensorboard-data-server 0.7.2
tensorboardX 2.6.2.2
threadpoolctl 3.2.0
tifffile 2023.7.10
timm 0.4.12
torch 1.13.1
torchvision 0.14.1
tqdm 4.66.1
traitlets 5.13.0
typing_extensions 4.8.0
tzdata 2023.3
urllib3 2.0.7
wcwidth 0.2.9
Werkzeug 3.0.1
wheel 0.41.2
zipp 3.17.0

aksg87 · Answer 3 · Fri Nov 03 2023 09:24:51 GMT+0800 (China Standard Time)

Okay got it! Sent from my iPhone.

…

On Thu, Nov 2, 2023 at 6:03 PM mlewis1973 ***@***.***> wrote: It's also not clear why it's downloading the tensorflow efficientnet as opposed to the torch version.... here's 'pip list' from my curated requirements.txt above: (adpkd) ***@***.*** adpkd-segmentation-pytorch % pip list Package Version Editable project location ------------------------------ absl-py 2.0.0 adpkd-segmentation 1.0 /Users/mlewis/flywheel/Gears/AI/ADPKD-segmentation/adpkd-segmentation-pytorch albumentations 1.0.3 appnope 0.1.3 asttokens 2.4.1 backcall 0.2.0 cachetools 5.3.2 catalyst 20.8.2 certifi 2023.7.22 charset-normalizer 3.3.2 contourpy 1.1.1 cycler 0.12.1 decorator 5.1.1 deprecation 2.1.0 efficientnet-pytorch 0.6.3 executing 2.0.1 fonttools 4.43.1 gitdb 4.0.11 GitPython 3.1.40 google-auth 2.23.4 google-auth-oauthlib 1.0.0 grpcio 1.59.2 idna 3.4 imageio 2.31.6 importlib-metadata 6.8.0 importlib-resources 6.1.0 ipython 8.12.3 jedi 0.19.1 joblib 1.3.2 kiwisolver 1.4.5 lazy_loader 0.3 Markdown 3.5.1 MarkupSafe 2.1.3 matplotlib 3.7.3 matplotlib-inline 0.1.6 munch 4.0.0 networkx 3.1 nibabel 3.2.1 numpy 1.24.4 oauthlib 3.2.2 opencv-python-headless 4.5.3.56 packaging 23.2 pandas 2.0.3 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 Pillow 10.0.1 pip 23.3 plotly 5.18.0 pretrainedmodels 0.7.4 prompt-toolkit 3.0.39 protobuf 4.25.0 ptyprocess 0.7.0 pure-eval 0.2.2 pyasn1 0.5.0 pyasn1-modules 0.3.0 pydicom 2.3.0 Pygments 2.16.1 pyparsing 3.1.1 python-dateutil 2.8.2 pytz 2023.3.post1 PyWavelets 1.4.1 PyYAML 6.0.1 requests 2.31.0 requests-oauthlib 1.3.1 rsa 4.9 scikit-image 0.21.0 scikit-learn 1.3.2 scipy 1.10.1 seaborn 0.11.1 segmentation-models-pytorch 0.2.0 setuptools 68.0.0 SimpleITK 2.0.2 six 1.16.0 smmap 5.0.1 stack-data 0.6.3 tenacity 8.2.3 tensorboard 2.14.0 tensorboard-data-server 0.7.2 tensorboardX 2.6.2.2 threadpoolctl 3.2.0 tifffile 2023.7.10 timm 0.4.12 torch 1.13.1 torchvision 0.14.1 tqdm 4.66.1 traitlets 5.13.0 typing_extensions 4.8.0 tzdata 2023.3 urllib3 2.0.7 wcwidth 0.2.9 Werkzeug 3.0.1 wheel 0.41.2 zipp 3.17.0 — Reply to this email directly, view it on GitHub <#37 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AETBANYEOTDIPUPJ75W5XVDYCQRDRAVCNFSM6AAAAAA63DJFP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJRGY4DCNRSGY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mlewis1973 · Answer 4 · Sun Nov 05 2023 23:31:08 GMT+0800 (China Standard Time)

The missing instruction is that the package does not pull the model from git-lfs automatically. You have to do it manually.
conda install git-lfs
git lfs install
git lfs pull

the following requirements.txt is all you need for inference and containerization:
albumentations==1.0.3
catalyst==20.8.2
nibabel==3.2.1
#nvidia-cublas-cu11==11.10.3.66
#nvidia-cuda-nvrtc-cu11==11.7.99
#nvidia-cuda-runtime-cu11==11.7.99
#nvidia-cudnn-cu11==8.5.0.96
opencv-python-headless==4.5.3.56
pydicom==2.3.0
seaborn==0.11.1
segmentation-models-pytorch~=0.2.0
SimpleITK==2.0.2
torch==1.13.1
torchvision==0.14.1