why take 3 channels png image as input?
skywalker0523 opened this issue · comments
Thank you very much for the excellent work! I have a question I'd like to ask you. The 2D slices of 3D nii images have 1 channel, but you converted these slices into PNG images with 3 channels. Additionally, the mean and std differ across these three channels,mean:[123.675, 116.28, 103.53],std:[58.395, 57.12, 57.375]. Could you please explain how this conversion was done? I intend to fine-tune your model on my own dataset.
Converting the image to a three-channel format is done to accommodate the input format of the original ViT model. In this case, converting to a three-channel image can be understood as replicating the original single-channel image three times. The normalization operation on the image also follows the parameters used in the original SAM. When fine-tuning your own network, you can set different mean and standard deviation values, such as [0.5], [0.5].