uni-medical / STU-Net

The largest pre-trained medical image segmentation model (1.4B parameters) based on the largest public dataset (>100k annotations), up until April 2023.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dataset conversion instructions

zhi-xuan-chen opened this issue · comments

I want to ask whether the fine-tune dataset you used in your work are just follow the preprocessing method provided by nnUNet_v1?Thank you very much if you can help me!

@zhi-xuan-chen
Thank you for your interest in our work! Yes, for the fine-tuning dataset, we followed the preprocessing method provided by nnUNet_v1. If you have any further questions, please feel free to ask.

OK, thank you very much!

I am trying to fine-tune Amos2022 datasets with pretrained model you provide to reproduce you results. Now I have finished the "dataset conversion" period according to nnUNet-v1, then I need to run "nnUNet_plan_and_preprocess". I want to ask which script I need to run?

@zhi-xuan-chen
After completing the dataset conversion according to nnUNet-v1, you indeed need to run the "nnUNet_plan_and_preprocess" step. You can execute the following script:

nnUNet_plan_and_preprocess -t <TASK_ID>

Replace <TASK_ID> with the task ID of your Amos2022 dataset.

After running the "nnUNet_plan_and_preprocess" script, follow the fine-tuning instructions in our README file.

Thank you for your answer! I just find a problem. The dataset folder structure required by nnUNet-v1 only contains "imagesTr" and "labelsTr", but the current AMOS dataset https://zenodo.org/record/7155725#.Y0OOCOxBztM. contains extra "imagesVa" and "labelsVa". Their validation dataset are seperated from the training dataset, compared with the cross-validation dataset of nnUNet-v1. So, how can I do to ensure the validation datasets are the data in the "imagesVa" folder? Whether the dataset conversion process should make some difference?

@zhi-xuan-chen
To address the AMOS dataset folder structure issue, here are two approaches you can consider:

  1. Combine the "imagesVa" and "labelsVa" folders with the "imagesTr" and "labelsTr" folders. Then modify the "split_final" file to include the correct validation cases.

  2. Maintain the original "imagesTr" and "labelsTr" folders. Set the training fold to "all" to train the model using all training data. After training, perform inference on the validation data using the "nnUNet_predict" script and calculate the performance metrics based on the validation ground truth.

Choose the method you find more convenient, and please let us know if you need any further help.

Thank you for your detailed answer. They are really helpful! I think the second one is suitable for me. And I want to know how can I convert the Amos2022 to the required format of nnUNet-v1. It seems the nnUNet_convert_decathlon_task command cannot work for the non-MSD dataset, I got the error fo AssertionError: Input folder start with TaskXX with XX being a 3-digit id: 00, 01, 02 etc, but the required id which surpass 500 is always 3-digit.

You are correct that the nnUNet_convert_decathlon_task command is designed for the Medical Segmentation Decathlon (MSD) dataset and may not work directly with non-MSD datasets like AMOS 2022.

For converting the AMOS 2022 dataset, I recommend referring to the official nnUNet documentation and tutorials for guidance on how to preprocess and convert non-MSD datasets.

Thank you very much! I have converted the dataset successfully.