why take 3 channels png image as input?

Question

why take 3 channels png image as input?

skywalker0523 opened this issue 6 months ago · comments

Thank you very much for the excellent work! I have a question I'd like to ask you. The 2D slices of 3D nii images have 1 channel, but you converted these slices into PNG images with 3 channels. Additionally, the mean and std differ across these three channels，mean:[123.675, 116.28, 103.53],std:[58.395, 57.12, 57.375]. Could you please explain how this conversion was done? I intend to fine-tune your model on my own dataset.

Junlong · Answer 1 · Thu Apr 11 2024 10:19:34 GMT+0800 (China Standard Time)

Converting the image to a three-channel format is done to accommodate the input format of the original ViT model. In this case, converting to a three-channel image can be understood as replicating the original single-channel image three times. The normalization operation on the image also follows the parameters used in the original SAM. When fine-tuning your own network, you can set different mean and standard deviation values, such as [0.5], [0.5].