ttanida / rgrg

Code for the CVPR paper "Interactive and Explainable Region-guided Radiology Report Generation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Run create.dataset.py to partition the dataset and generate empty files

Eleanorhxd opened this issue · comments

Hello, why is the CSV file for the validation set and test set empty when I run created_dataset.py to partition the training set, validation set, and test set? What is the reason for this.

Yes, I have meet the same problem. Have you solved it yet

create_dataset.py is just a simple Python script.

Just put in some breakpoints in the code and run the debugger. I would assume that the file paths may not have been correctly defined in path_datasets_and_weights.py.

Observing the values of the variables during debugging, it should become clear what the cause of the empty files are.

I have meet the same problem. Have you solved it yet

I just had some email correspondence with a researcher who had the same problem.

The root cause was that his path_mimic_cxr variable in src/path_datasets_and_weights.py did not point to a directory that contained a (sub-)directory called "files" with the reference reports in it.

As written in src/path_datasets_and_weights.py:

MIMIC-CXR and MIMIC-CXR-JPG dataset paths should both have a (sub-)directory called "files" in their directories.

Note that we only need the report txt files from MIMIC-CXR, which are in the file mimic-cxr-report.zip at
https://physionet.org/content/mimic-cxr/2.0.0/.

So:

  • MIMIC-CXR-JPG path contains all the images in jpg format
  • MIMIC-CXR path contains the reference reports in txt file format (contained in mimic-cxr-report.zip, which is only 135.4 MB).

Since his reference reports were not available, the validation and test sets are both empty, since the function get_reference_report has these lines:

    if not os.path.exists(path_to_report):
        shortened_path_to_report = os.path.join(f"p{subject_id[:2]}", f"p{subject_id}", f"s{study_id}.txt")
        missing_reports.append(shortened_path_to_report)
        return -1

If you look in the log file called log_file_dataset_creation.txt that is created when the script has finished, the line num_missing_reports: xxx should consequently display a high number, indicating that the reference reports were missing.