Run create.dataset.py to partition the dataset and generate empty files

Question

Run create.dataset.py to partition the dataset and generate empty files

Eleanorhxd opened this issue 3 months ago · comments

Hello, why is the CSV file for the validation set and test set empty when I run created_dataset.py to partition the training set, validation set, and test set? What is the reason for this.

sujaly · Answer 1 · Wed May 29 2024 22:49:42 GMT+0800 (China Standard Time)

Yes, I have meet the same problem. Have you solved it yet

Tim Tanida · Answer 2 · Thu May 30 2024 01:00:27 GMT+0800 (China Standard Time)

create_dataset.py is just a simple Python script.

Just put in some breakpoints in the code and run the debugger. I would assume that the file paths may not have been correctly defined in path_datasets_and_weights.py.

Observing the values of the variables during debugging, it should become clear what the cause of the empty files are.

sujaly · Answer 3 · Thu May 30 2024 01:00:58 GMT+0800 (China Standard Time)

这是来自QQ邮箱的假期自动回复邮件。你好，我最近正在休假中，无法亲自回复你的邮件。我将在假期结束后，尽快给你回复。

Eleanor · Answer 4 · Thu May 30 2024 16:08:40 GMT+0800 (China Standard Time)

Hi ,I have not solved it.

…

------------------ 原始邮件 ------------------ 发件人: "ttanida/rgrg" ***@***.***>; 发送时间: 2024年5月29日(星期三) 晚上10:50 ***@***.***>; ***@***.******@***.***>; 主题: Re: [ttanida/rgrg] Run create.dataset.py to partition the dataset and generate empty files (Issue #28) Yes, I have meet the same problem. Have you solved it yet — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Cckyrie · Answer 5 · Fri Jun 07 2024 12:35:31 GMT+0800 (China Standard Time)

I have meet the same problem. Have you solved it yet

Tim Tanida · Answer 6 · Tue Jul 16 2024 19:46:16 GMT+0800 (China Standard Time)

I just had some email correspondence with a researcher who had the same problem.

The root cause was that his path_mimic_cxr variable in src/path_datasets_and_weights.py did not point to a directory that contained a (sub-)directory called "files" with the reference reports in it.

As written in src/path_datasets_and_weights.py:

MIMIC-CXR and MIMIC-CXR-JPG dataset paths should both have a (sub-)directory called "files" in their directories.

Note that we only need the report txt files from MIMIC-CXR, which are in the file mimic-cxr-report.zip at
https://physionet.org/content/mimic-cxr/2.0.0/.

So:

MIMIC-CXR-JPG path contains all the images in jpg format
MIMIC-CXR path contains the reference reports in txt file format (contained in mimic-cxr-report.zip, which is only 135.4 MB).

Since his reference reports were not available, the validation and test sets are both empty, since the function get_reference_report has these lines:

    if not os.path.exists(path_to_report):
        shortened_path_to_report = os.path.join(f"p{subject_id[:2]}", f"p{subject_id}", f"s{study_id}.txt")
        missing_reports.append(shortened_path_to_report)
        return -1

If you look in the log file called log_file_dataset_creation.txt that is created when the script has finished, the line num_missing_reports: xxx should consequently display a high number, indicating that the reference reports were missing.

sujaly · Answer 7 · Tue Jul 16 2024 19:46:48 GMT+0800 (China Standard Time)

这是来自QQ邮箱的假期自动回复邮件。你好，我最近正在休假中，无法亲自回复你的邮件。我将在假期结束后，尽快给你回复。