abrilcf / mednerf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reconstruction given an X-ray

povolann opened this issue · comments

Hello,

I have trained the knee model and everything seems fine, but I have mutiple questions. The main - how do I evaluate the model after training, I have used
python finetune_xray.py --xray_img_path /home/anya/Programs/mednerf-main/graf-main/data/knee_xrays/01_xray0008.png --save_dir /home/anya/Programs/mednerf-main/graf-main/results/knee_all_360/evaluate --model /home/anya/Programs/mednerf-main/graf-main/results/knee_all_360/chkpts/model.best.pt configs/knee.yaml

But I am not sure if it is correct... because...
--xray_img_path - is path to any X-ray from which I want to generate the CT images, right?

First, I got some error mesasges about missing modules: ray, ignite and pandas, so I have installed these;
https://github.com/ray-project/ray: pip install "ray[tune]" - is this the correct ray?

And I tried to run the command again, but I got ImportError: cannot import name 'ImageFolder' from 'graf.utils' (/home/anya/Programs/mednerf-main/graf-main/graf/utils.py), so I commented ImageFolder because it is not defined in graf.utils, but it seems that it is not used in the code.

And for the ray configuration, I guess I have to change line 256 in finetune_xray.py: ray.init(runtime_env={"conda": "/home/anya/anaconda3", ... So, I have changed the 1st parameter, but I am not sure what exactly is the rest of the parameters, so can you please explain a bit more about the ray configuration?

Thank you!

Hi,
Sorry for the late reply. I guess I didn't have the notifications on for issues.

  1. Yes. That is the path of the folder containing the x-ray you wish to generate CT projections. You don't need to specify the filename. We did it like this to later support more than one x-ray as inputs which could increase the accuracy.

  2. That is correct. pip install "ray[tune]" is the command used to install ray for finetuning.

  3. You're probably getting the ImportError: cannot import name 'ImageFolder' from 'graf.utils' error because of the dependencies needed for running ray. But once that's is sorted the error should disappear. Here's a summary of the parameters for specifying all the dependencies for the script:

"conda": "path/to/your/conda-environment"
"excludes": [files and / or folders you don't want to include] (I added this because I had another version of our model and was causing some issues with ray. In your case, you don't need this).
"py_modules": [paths to local modules needed for the code].
If you wish to know more about it here's the link to the documentation: [https://docs.ray.io/en/master/ray-core/handling-dependencies.html]

Hi,

thank you for your comment. I have tried to adjust the enviroment, but I am still confused because it still gives me some errors.

  1. I changed the ray.init part, so that it would match my env. I guess that the D-graf file in the code is the same as graf-main in this repository. But still, I was getting the same error because there is actually no ImageFolder in graf.utils, I re-checked the code in this repository and I haven!t find it too. Can you please re-check? Or am I misunderstanding something? All the other paremetrs are defined there though...
  2. Anyways, I have tried to run it with commented ImageFolder again, now it asked me to install also pip install "ray[default]", so I did that, but I am getting bunch of errors, I am including only part, I am still quite new to work with ray, so I am not sure if these 2 errors are actually connected somehow.
  3. Additional question - so, since I trained the knee and lung models, I could actually try to generate the CT projections from totally different images of lungs or knees which are not included in the training process, right?

Thank you for your help as always!

(raylet)   File "/home/anya/anaconda3/envs/graf/lib/python3.8/socket.py", line 918, in getaddrinfo
(raylet)     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
(raylet) socket.gaierror: [Errno -2] Name or service not known
(raylet) 
(raylet) During handling of the above exception, another exception occurred:
(raylet) 
(raylet) Traceback (most recent call last):
(raylet)   File "/home/anya/anaconda3/envs/graf/lib/python3.8/site-packages/ray/dashboard/agent.py", line 407, in <module>
(raylet)     gcs_publisher = GcsPublisher(args.gcs_address)
(raylet) TypeError: __init__() takes 1 positional argument but 2 were given
(pid=gcs_server) [2022-04-14 21:54:09,270 E 35933 35933] (gcs_server) gcs_actor_scheduler.cc:320: The lease worker request from node 64f14a7547d02c83efe9d5b268934eb5e8d14062aecf63fc10df4dd7 for actor db7237c6a7dbc086bacc0db301000000(ImplicitFunc.__init__) has been canceled, job id = 01000000, cancel type: SCHEDULING_CANCELLED_RUNTIME_ENV_SETUP_FAILED
(pid=gcs_server) [2022-04-14 21:54:09,275 E 35933 35933] (gcs_server) gcs_actor_scheduler.cc:320: The lease worker request from node 64f14a7547d02c83efe9d5b268934eb5e8d14062aecf63fc10df4dd7 for actor 00d89956d5a38d9cdf356fa701000000(ImplicitFunc.__init__) has been canceled, job id = 01000000, cancel type: SCHEDULING_CANCELLED_RUNTIME_ENV_SETUP_FAILED

Sorry, I have a few more questions because I was going through the code very diligently right now.
As I have understand, the acc, depth, rgb images and the videos generated during training to file results/knee_all_360 (for example) are kind of visualisation of validation data during the training? Thank you again!

Hi,

  1. You're right ImageFolder was missing in that file version. Sorry about that. I have updated the file.
  2. Could you please share how you call ray.init in your code? The error indicates that it's getting two arguments for runtime_env. It should be inside of curly brackets, so probably there's something wrong there as it looks like ray was installed successfully.
  3. You should obtain consistent projections for Xrays that are close to the data training distribution. As you can see the knee dataset is too small, but accuracy would be better with a larger dataset.
  4. Yes. Those are intermediate training results.

Thank you for your questions. I'll update the instructions to make them clearer.

Hi,

so this is how I call ray.init, I am basically using the finetune_xray.py:
ray.init(runtime_env={"conda": "/home/anya/anaconda3", "py_modules": ["/home/anya/Programs/mednerf/graf-main/submodules", "/home/anya/Programs/mednerf/graf-main/graf", "/home/anya/Programs/mednerf/graf-main/submodules/GAN_stability/", "/home/anya/Programs/mednerf/graf-main/configs"]}) and I use
python finetune_xray.py --xray_img_path /home/anya/Programs/mednerf/graf-main/testing/ --save_dir /home/anya/Programs/mednerf/graf-main/inference/ --model /home/anya/Programs/mednerf/graf-main/results/knee_all_360/chkpts/model.best.pt configs/knee.yaml in graf-main.

Here is the file, just to be sure: https://github.com/povolann/mednerf/blob/main/finetune_xray.py

Hi,
That looks correct. Could you share more of the log? I assume a big part of it is repeated errors from the different runs ray tried to initiate, but before the end, there should be more descriptive information. As a side note, is this model.best.pt the name of your trained model? Shouldn't it be model_best.pt? Probably that it's not the issue, but still.

Here is the full log.
And yeah, the name is model_best.pt. But still I get this error...

out.txt

I see. It seems to be a recent bug on ray releases side. Please take a look at the following issues.
alpa-projects/alpa#383
alpa-projects/alpa#377
They solved it by trying the Nightlies

So, I have managed to install Nightlies version, just if someone else would be curious how:

pip uninstall -y ray
pip install -U LINK_TO_WHEEL.whl
pip install -U "ray[tune]"

But I had an error that some tensors are on the GPU and some are on CPU, so I have adjusted the run_nerf_helpers_mod.py line 235 and run_nerf_mod.py lines 241, 258, 291 with .cuda(). I have put the codes in my github repository mednerf.

Now, the code seems to run (at least it doesn't report any errors), but the time is super long. I let it run over night (more than 12 hours on 2 GPUs GeForce GTX 1080 Ti) and it was on 76 trials out of 540. Is this normal? And if it is, is it possible to set up the number of trials to lower number, so I can see some results (even bad)? I mean, now it takes very long time even to find the best parameters. Or is the number of trials dependent on the search space for the parameters?

== Status ==
Current time: 2022-04-27 16:40:50 (running for 00:10:29.55)
Memory usage on this node: 13.9/62.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 4.0/8 CPUs, 2.0/2 GPUs, 0.0/34.22 GiB heap, 0.0/17.11 GiB objects (0.0/1.0 accelerator_type:GTX)
Current best trial: e4ce4_00003 with psnr=14.263408019378968 and parameters={'lr': 0.01187057000222333, 'b1': 0.5, 'lambda_percep': 0.1, 'lambda_mse': 0.2, 'lambda_nll': 0.1}
Result logdir: /home/anya/ray_results/finetune_xray_2022-04-27_16-30-20
Number of trials: 20/540 (16 PENDING, 4 RUNNING)
+---------------------------+----------+------------------------+------+--------------+--------------+-----------------+-------------+--------+------------------+---------+
| Trial name                | status   | loc                    |   b1 |   lambda_mse |   lambda_nll |   lambda_percep |          lr |   iter |   total time (s) |    psnr |
|---------------------------+----------+------------------------+------+--------------+--------------+-----------------+-------------+--------+------------------+---------|
| finetune_xray_e4ce4_00000 | RUNNING  | 143.248.247.156:294642 |  0   |          0.1 |          0.1 |             0.1 | 0.0201504   |      4 |          590.985 | 12.7948 |
| finetune_xray_e4ce4_00001 | RUNNING  | 143.248.247.156:294678 |  0.5 |          0.1 |          0.1 |             0.1 | 0.0153578   |      2 |          574.162 | 12.6427 |
| finetune_xray_e4ce4_00002 | RUNNING  | 143.248.247.156:294680 |  0   |          0.2 |          0.1 |             0.1 | 0.00627521  |      2 |          548.892 | 10.7213 |
| finetune_xray_e4ce4_00003 | RUNNING  | 143.248.247.156:294682 |  0.5 |          0.2 |          0.1 |             0.1 | 0.0118706   |      4 |          594.156 | 14.2634 |
| finetune_xray_e4ce4_00004 | PENDING  |                        |  0   |          0.3 |          0.1 |             0.1 | 0.0324823   |        |                  |         |
| finetune_xray_e4ce4_00005 | PENDING  |                        |  0.5 |          0.3 |          0.1 |             0.1 | 0.0750123   |        |                  |         |
| finetune_xray_e4ce4_00006 | PENDING  |                        |  0   |          0.1 |          0.2 |             0.1 | 0.00532728  |        |                  |         |
| finetune_xray_e4ce4_00007 | PENDING  |                        |  0.5 |          0.1 |          0.2 |             0.1 | 0.00206566  |        |                  |         |
| finetune_xray_e4ce4_00008 | PENDING  |                        |  0   |          0.2 |          0.2 |             0.1 | 0.00121087  |        |                  |         |
| finetune_xray_e4ce4_00009 | PENDING  |                        |  0.5 |          0.2 |          0.2 |             0.1 | 0.00051962  |        |                  |         |
| finetune_xray_e4ce4_00010 | PENDING  |                        |  0   |          0.3 |          0.2 |             0.1 | 0.000723825 |        |                  |         |
| finetune_xray_e4ce4_00011 | PENDING  |                        |  0.5 |          0.3 |          0.2 |             0.1 | 0.0532771   |        |                  |         |
| finetune_xray_e4ce4_00012 | PENDING  |                        |  0   |          0.1 |          0.3 |             0.1 | 0.000290957 |        |                  |         |
| finetune_xray_e4ce4_00013 | PENDING  |                        |  0.5 |          0.1 |          0.3 |             0.1 | 0.000455452 |        |                  |         |
| finetune_xray_e4ce4_00014 | PENDING  |                        |  0   |          0.2 |          0.3 |             0.1 | 0.00426141  |        |                  |         |
| finetune_xray_e4ce4_00015 | PENDING  |                        |  0.5 |          0.2 |          0.3 |             0.1 | 0.00176721  |        |                  |         |
| finetune_xray_e4ce4_00016 | PENDING  |                        |  0   |          0.3 |          0.3 |             0.1 | 0.0987178   |        |                  |         |
| finetune_xray_e4ce4_00017 | PENDING  |                        |  0.5 |          0.3 |          0.3 |             0.1 | 0.00474325  |        |                  |         |
| finetune_xray_e4ce4_00018 | PENDING  |                        |  0   |          0.1 |          0.1 |             0.2 | 0.00149893  |        |                  |         |
| finetune_xray_e4ce4_00019 | PENDING  |                        |  0.5 |          0.1 |          0.1 |             0.2 | 0.0110002   |        |                  |         |
+---------------------------+----------+------------------------+------+--------------+--------------+-----------------+-------------+--------+------------------+---------+


Create samples...: 100%|██████████| 1/1 [00:02<00:00,  2.90s/it]
Create samples...:   0%|          | 0/1 [00:00<?, ?it/s]
Create samples...: 100%|██████████| 1/1 [00:08<00:00,  8.78s/it]
Create samples...:   0%|          | 0/1 [00:00<?, ?it/s]
Create samples...: 100%|██████████| 1/1 [00:07<00:00,  7.72s/it]
Create samples...: 100%|██████████| 1/1 [00:02<00:00,  2.91s/it]
Create samples...:   0%|          | 0/1 [00:00<?, ?it/s]
Create samples...:   0%|          | 0/1 [00:00<?, ?it/s]
Create samples...: 100%|██████████| 1/1 [00:02<00:00,  2.91s/it]
Create samples...:   0%|          | 0/1 [00:00<?, ?it/s]

Where exactly are the generated images saving? Because in the script finetune_xray.py the lines 53 (interpolate: vutils.save_image(inter, img_name, nrow=step)) and 231: vutils.save_image(results, outpath, nrow=1) are comented...

Hi, we considered five hyperparameters at the start of finetuning, so it took quite some time. You can use the ones mentioned in the paper, especially the weights of the losses won't need any further finetuning. Also, the number of iterations in the loop was picked arbitrarily. So, you could fix it to a one-digit integer and run the script a few times until getting a decent result.
I just modified the script. Sorry about the mess!

Hi
Can you provide direct files of this code so that i can execute it on my Google collab notebook.