Run create.dataset.py to partition the dataset and generate empty files
Eleanorhxd opened this issue · comments
Hello, why is the CSV file for the validation set and test set empty when I run created_dataset.py to partition the training set, validation set, and test set? What is the reason for this.
Yes, I have meet the same problem. Have you solved it yet
create_dataset.py is just a simple Python script.
Just put in some breakpoints in the code and run the debugger. I would assume that the file paths may not have been correctly defined in path_datasets_and_weights.py.
Observing the values of the variables during debugging, it should become clear what the cause of the empty files are.
I have meet the same problem. Have you solved it yet
I just had some email correspondence with a researcher who had the same problem.
The root cause was that his path_mimic_cxr
variable in src/path_datasets_and_weights.py
did not point to a directory that contained a (sub-)directory called "files" with the reference reports in it.
As written in src/path_datasets_and_weights.py
:
MIMIC-CXR and MIMIC-CXR-JPG dataset paths should both have a (sub-)directory called "files" in their directories.
Note that we only need the report txt files from MIMIC-CXR, which are in the file mimic-cxr-report.zip at
https://physionet.org/content/mimic-cxr/2.0.0/.
So:
- MIMIC-CXR-JPG path contains all the images in jpg format
- MIMIC-CXR path contains the reference reports in txt file format (contained in mimic-cxr-report.zip, which is only 135.4 MB).
Since his reference reports were not available, the validation and test sets are both empty, since the function get_reference_report
has these lines:
if not os.path.exists(path_to_report):
shortened_path_to_report = os.path.join(f"p{subject_id[:2]}", f"p{subject_id}", f"s{study_id}.txt")
missing_reports.append(shortened_path_to_report)
return -1
If you look in the log file called log_file_dataset_creation.txt
that is created when the script has finished, the line num_missing_reports: xxx
should consequently display a high number, indicating that the reference reports were missing.