am i training it right?

Question

am i training it right?

mikeyt14 opened this issue 3 years ago · comments

I have been training this model and almost every time I get something similar to these results for the first stage and second stage
First stage (fusion, no thresh):

Second stage (fusion, no thresh):

The ground truth seems to be better represented by the first stage than by the second stage, which does not seem correct to me. Currently, to train the model I have tried two approaches:

change args.data_dir, run front_main.py and fusion_main.py for one of SVC, DVC, or SVC+DVC
change args.data_dir and run front_main.py for each SVC, DVC, and SVC+DVC. Afterwards, run fusion_main.py
I would like to know if I am doing anything incorrectly or if there are some additional details I should keep in mind when training the model. Thank you

Yuhui Ma · Answer 1 · Mon Jul 05 2021 09:04:04 GMT+0800 (China Standard Time)

Was the second stage trained with enough epochs or finetuned well? It seems no problem for the first stage, but the second stage was not well trained. In addition, the first training approach you have tried was adopted by us.

michael t · Answer 2 · Tue Jul 06 2021 14:07:40 GMT+0800 (China Standard Time)

Was the second stage trained with enough epochs or finetuned well? It seems no problem for the first stage, but the second stage was not well trained. In addition, the first training approach you have tried was adopted by us.

I have been running 300 epochs per stage with an initial learning rate of 0.0003, but I also tried 200 epochs per stage with initial learning rate of 0.0005 as specified in the paper and it is not much different. I also changed thin_gt = sample[2].to(device) and thick_gt = sample[3].to(device) to thin_gt = sample[1].to(device) and thick_gt = sample[1].to(device) respectively for all datasets, since the original values gave me indexing errors. I also noticed some lines of code throughout the program have been commented out. Am I supposed to use some of these lines of code when training the network?

Yuhui Ma · Answer 3 · Tue Jul 06 2021 20:17:11 GMT+0800 (China Standard Time)

We are sorry that our code is imperfect and there may exist some bugs. I am also not sure where your problem lies. This is my result of the second stage:

michael t · Answer 4 · Wed Jul 07 2021 14:35:29 GMT+0800 (China Standard Time)

The paper suggests that the fine stage utilizes the maps produced by the coarse stage, but that does not seem to be happening when I run the code. When I delete the results/rose/first_stage folder, the code still runs and produces the same results as if I did not delete it. The only part of the code for fusion_main.py I can find that uses results from the coarse stage is the part that loads front_model-189-0.9019971996250081.pth, which was not even a file produced by front_main.py since it needed to be downloaded from here. Currently, whenever I need to run fusion_main.py I add front_model-189-0.9019971996250081.pth to the models/rose/first_stage folder every time front_main.py finishes running. Is there something I need to do differently here?

Yuhui Ma · Answer 5 · Thu Jul 08 2021 08:53:52 GMT+0800 (China Standard Time)

One thing must be pointed out that the two stages are trained and tested separately, and the second stage utilizes the trained model of the first stage to produce the maps, rather than the results in the results/rose/first_stage folder. In addition, front_model-189-0.9019971996250081.pth was the first stage model trained by us and you should replace it with yours.

michael t · Answer 6 · Thu Jul 08 2021 13:44:27 GMT+0800 (China Standard Time)

Thank you for the clarification. I changed first_suffix and first_suffix1 to a random model value in models/rose/first_stage but now I get the following error when I try to run fusion_main.py:

I found out that this happens because when I use my first stage model, the values thick_pred = front_net_thick(img) and thin_pred= front_net_thin(img) in train.py lines 61-62 are tuples when they need to be tensors. I did not get that error when using 189-0.9019971996250081.pth because thick_pred and thin_pred were tensors. I believe the cause for this can be traced back to val.py or front_main.py, but I cannot quite pinpoint what exactly causes it.

Yuhui Ma · Answer 7 · Thu Jul 08 2021 15:45:52 GMT+0800 (China Standard Time)

We are sorry that there exist some problems when we reorganized the code.

For vessel segmentation at both pixel level and centerline level, the first stage model has three outputs: pixel-level map, centerline-level map and fusion of both maps. If you want to accomplish such a task, you should modify thick_pred = front_net_thick(img) and thin_pred= front_net_thin(img) in both train.py and test.py as follows, and both front_net_thick and front_net_thin refer to the same first stage model you have trained:
thick_pred, _, _ = front_net_thick(img)
_, thin_pred, _ = front_net_thin(img)

While for vessel segmentation only at pixel level or centerline level, the first stage model has only one output. In this case, you should modify class SRF_UNet in first_stage.py - remove the thin branch and fusion branch, and return out_thick only.

To this end, you should modify the code according to the vessel segmentation task. We are sorry for our badly written and organized project again.

michael t · Answer 8 · Sat Jul 10 2021 11:51:39 GMT+0800 (China Standard Time)

Thank you, the fusion stage is working fine for me now. It is producing segmentation maps and metrics for SVC, SVC+DVC and ROSE-2 very similar to what was presented in the paper. However, one last concern I have is that for DVC I am getting final outputs and metrics that differ significantly from the data in the paper and the ground truth. Below is a side-by-side comparison of the final output from the fusion stage for one image in DVC and its corresponding ground truth:

Also, here is a comparison of the metrics. The top row is what was presented in the paper and the bottom row is what was outputted by my code

I ran fusion_main.py for DVC with the same code that was used for SVC, SVC+DVC, and ROSE-2 (i.e. I did not alter the code when switching datasets to train). I implemented the following "thick_pred, , _ = front_net_thick(img)" and ", thin_pred, _ = front_net_thin(img)" changes from your earlier message and used latest.pth for --first_suffix and --first_suffix1. Are there any changes that I need to make when training DVC specifically?

Yuhui Ma · Answer 9 · Sat Jul 10 2021 14:52:24 GMT+0800 (China Standard Time)

Did you give a three-pixel tolerance region around the manually traced centerlines when calculating evaluation metrics (replace kernel_size=(1, 1) with kernel_size=(3, 3) in evaluation.py)? Moreover, it also seems many capillaries missing in your results. As I mentioned previously, when you want to do the centerline-level vessel segmentation task for DVC, you should modify class SRF_UNet in first_stage.py - remove the thin branch and fusion branch, and return out_thick only. thick_pred, _, _ = front_net_thick(img) and _, thin_pred, _ = front_net_thin(img) in both train.py and test.py should be also modified to thick_pred = front_net_thick(img) and thin_pred= front_net_thin(img), respectively. It is the same for ROSE-2, which is also with centerline-level vessel annotations.

michael t · Answer 10 · Sun Jul 11 2021 10:29:25 GMT+0800 (China Standard Time)

Thank you for the clarification. Just to be sure, the changes you mentioned for SRF_UNet in first_stage.py are supposed to be implemented after running front_main.py right?

Yuhui Ma · Answer 11 · Sun Jul 11 2021 11:12:31 GMT+0800 (China Standard Time)

No, you should make all modifications mentioned above before running front_main.py.

michael t · Answer 12 · Mon Jul 12 2021 04:06:03 GMT+0800 (China Standard Time)

Alright, in that case I have 2 interpretations for the changes to SRF_UNet. I am not sure which one is correct since both need additional changes to work:

SRF_UNet still returns 3 values, but they are all out_thick. If I do this, I would need to keep thick_pred, _, _ = front_net_thick(img) and _, thin_pred, _ = front_net_thin(img) for it to work. This results in metrics that are somewhat closer to what was given in the paper. Here is an example side-by-side comparison between a segmentation map produced this way and the ground truth:
SRF_UNet only returns 1 value, which is out_thick. In this case, the first error I encounter is:

I solved this error by deleting ', _'. . I am not sure why net(img) unpacked 2 values when SRF_UNet was changed to output one value. If I change it to output X copies of out_thick, net(img) returns X identical tensors. However, when I only have it output one copy of out_thick, net(img) returns 2 different tensors. This is something I have not been able to figure out. Anyways, afterwards I encountered another error in test.py:

which I fixed by replacing thick_pred, thin_pred, pred = net(img) with thick_pred=thin_pred= pred= net(img). Afterwards, the code was able to run but gives me the following warning after each epoch with wildly incorrect segmentation maps:

Is there something I am doing incorrectly, or is there perhaps an interpretation I am not aware of?

Yuhui Ma · Answer 13 · Mon Jul 12 2021 08:55:15 GMT+0800 (China Standard Time)

Please check SRF_UNet only returns out_thick first. The first error (got 2) indicates that SRF_UNet returns two values. Next, please modify all
thick_pred, _, _ = front_net_thick(img)
_, thin_pred, _ = front_net_thin(img)
thick_pred, thin_pred, _ = net(img)
in both train.py and test.py respectively to
thick_pred = front_net_thick(img)
thin_pred = front_net_thin(img)
thick_pred = net(img)
Note that front_net_thick and front_net_thin are in fact the same first-stage model, and thick_pred and thin_pred should also be the same output of the model. Other errors might occur, and you could debug them by yourself. I've been busy recently and have no time to review the code for you.

Yuhui Ma · Answer 14 · Mon Jul 12 2021 09:02:57 GMT+0800 (China Standard Time)

If you can't solve it, you could run the code according to your first interpretation. I think there are few differences between the two interpretations.

liyiersan · Answer 15 · Fri Mar 11 2022 16:27:17 GMT+0800 (China Standard Time)

Alright, in that case I have 2 interpretations for the changes to SRF_UNet. I am not sure which one is correct since both need additional changes to work:

SRF_UNet still returns 3 values, but they are all out_thick. If I do this, I would need to keep thick_pred, _, _ = front_net_thick(img) and _, thin_pred, _ = front_net_thin(img) for it to work. This results in metrics that are somewhat closer to what was given in the paper. Here is an example side-by-side comparison between a segmentation map produced this way and the ground truth:

SRF_UNet only returns 1 value, which is out_thick. In this case, the first error I encounter is:

I solved this error by deleting ', _'. . I am not sure why net(img) unpacked 2 values when SRF_UNet was changed to output one value. If I change it to output X copies of out_thick, net(img) returns X identical tensors. However, when I only have it output one copy of out_thick, net(img) returns 2 different tensors. This is something I have not been able to figure out. Anyways, afterwards I encountered another error in test.py:

which I fixed by replacing thick_pred, thin_pred, pred = net(img) with thick_pred=thin_pred= pred= net(img). Afterwards, the code was able to run but gives me the following warning after each epoch with wildly incorrect segmentation maps:

Is there something I am doing incorrectly, or is there perhaps an interpretation I am not aware of?

Hello, in your expression, I see that your SVC_DVC dataset running similarly to the paper, but I only have SVC dataset close to that mentioned in the paper. Could you please show me your modification?

BenjaminBo · Answer 16 · Wed Aug 31 2022 19:41:38 GMT+0800 (China Standard Time)

@mikeyt14 hey could you upload the front_model-189-0.9019971996250081.pth on a drive somewhere? It seems like you were able to download it. I wasn't though and I think that might be a reason why I wasn't able to reproduce the results of the paper

Aylin · Answer 17 · Fri Sep 02 2022 02:14:06 GMT+0800 (China Standard Time)

I have been training this model and almost every time I get something similar to these results for the first stage and second stage First stage (fusion, no thresh): Second stage (fusion, no thresh):

The ground truth seems to be better represented by the first stage than by the second stage, which does not seem correct to me. Currently, to train the model I have tried two approaches:

change args.data_dir, run front_main.py and fusion_main.py for one of SVC, DVC, or SVC+DVC

change args.data_dir and run front_main.py for each SVC, DVC, and SVC+DVC. Afterwards, run fusion_main.py
I would like to know if I am doing anything incorrectly or if there are some additional details I should keep in mind when training the model. Thank you