tar2bids gets subject name incorrect

Question

tar2bids gets subject name incorrect

AlanKuurstra opened this issue 4 years ago · comments

tar2bids determines that the date of the scan is my subject name.

the error occurs on:
Menon_Mouse_APPNL-G-F_20200210_NL_31_1F9_20200210_01.A3AD08CB.tar

where
study description: Menon^Mouse_APPNL-G-F
date: 20200210
subject: NL_31_1F9
study_id: 20200210_01
hash of study uid: A3AD08CB

and the bids output is:
bids/sub-20200210/anat/sub-20200210_part-mag_echo_run-01_GRE.nii.gz

It might be worth getting info from the dicom header instead of an assumed folder or filename convention.

Ali Khan · Answer 1 · Wed Mar 11 2020 23:56:19 GMT+0800 (China Standard Time)

sorry for the delay -- wonder if this is best addressed by making a fix in cfmm2tar for the bruker data? What do you think @yinglilu and @AlanKuurstra ?

AlanKuurstra · Answer 2 · Thu Mar 12 2020 00:41:36 GMT+0800 (China Standard Time)

Yeah, it makes sense to bidsify the tags there.
I did something similar in python. Not sure if it will help, but here's the code:

import os, pydicom, subprocess, shutil
from glob import glob


def bidsify_string(string_to_bidsify):
    return string_to_bidsify.replace('_', '')


dicom_root = '/home/akuurstr/Desktop/Esmin_mouse_registration/mouse_scans/dicoms'
heuristic = '/softdev/akuurstr/python/modules/mouse_resting_state/cfmm_bruker_mouse_heudiconv_heuristic.py'
dcm_dir_template = os.path.join(dicom_root, '*/*/*/{subject}/{session}.*/*/*.dcm')
bids_output = '/home/akuurstr/Desktop/Esmin_mouse_registration/mouse_scans/bids2'

# remove underscores from patient names (used for BIDS subjects)
subject_folders = os.path.join(dcm_dir_template.split('{subject}')[0].replace('{session}','*'),'*')
for subject_folder in glob(subject_folders):
    shutil.move(subject_folder,
                os.path.join(os.path.dirname(subject_folder), bidsify_string(os.path.basename(subject_folder))))
# remove underscores from StudyIDs (used for BIDS session)
session_folders = os.path.join(dcm_dir_template.split('{session}')[0].replace('{subject}','*'),'*')
for session_folder in glob(session_folders):
    shutil.move(session_folder,
                os.path.join(os.path.dirname(session_folder), bidsify_string(os.path.basename(session_folder))))

completed_patient_sessions = []
for root, dirs, files in os.walk(dicom_root):
    for file in files:
        if file.endswith(".dcm"):
            dcm_file = pydicom.read_file(os.path.join(root, file), stop_before_pixels=True)
            if 'rsFMRI' in dcm_file.ProtocolName:
                bids_subject = bidsify_string(str(dcm_file.PatientName))
                bids_session = bidsify_string(str(dcm_file.StudyID))
                if (bids_subject,bids_session) in completed_patient_sessions:
                    continue
                subprocess.call(
                    ["heudiconv", "-b", "-d", dcm_dir_template, "-o", bids_output, "-f", heuristic, "-s", bids_subject,
                     "-ss", bids_session, "--overwrite"])
                completed_patient_sessions.append((bids_subject,bids_session))

Yingli Lu · Answer 3 · Thu Mar 12 2020 02:05:47 GMT+0800 (China Standard Time)

Sorry. On group meeting. Will check it out soon. yl

On Wed, Mar 11, 2020 at 12:41 PM AlanKuurstra ***@***.***> wrote: Yeah, it makes sense to bidsify the tags there. I did something similar in python. Not sure if it will help, but here's the code: import os, pydicom, subprocess, shutil from glob import glob def bidsify_string(string_to_bidsify): return string_to_bidsify.replace('_', '') dicom_root = '/home/akuurstr/Desktop/Esmin_mouse_registration/mouse_scans/dicoms' heuristic = '/softdev/akuurstr/python/modules/mouse_resting_state/cfmm_bruker_mouse_heudiconv_heuristic.py' dcm_dir_template = os.path.join(dicom_root, '*/*/*/{subject}/{session}.*/*/*.dcm') bids_output = '/home/akuurstr/Desktop/Esmin_mouse_registration/mouse_scans/bids' # remove underscores from patient names (used for BIDS subjects) #todo: extract all possible subject folders from dcm_dir_template for subject_folder in glob(os.path.join(dicom_root, "*/*/*/*")): shutil.move(subject_folder, os.path.join(os.path.dirname(subject_folder), bidsify_string(os.path.basename(subject_folder)))) # remove underscores from StudyIDs (used for BIDS session) #todo: extract all possible session folders from dcm_dir_template for session_folder in glob(os.path.join(dicom_root, "*/*/*/*/*")): shutil.move(session_folder, os.path.join(os.path.dirname(session_folder), bidsify_string(os.path.basename(session_folder)))) for root, dirs, files in os.walk(dicom_root): for file in files: if file.endswith(".dcm"): dcm_file = pydicom.read_file(os.path.join(root, file), stop_before_pixels=True) if 'rsFMRI' in dcm_file.ProtocolName: bids_subject = bidsify_string(str(dcm_file.PatientName)) bids_session = bidsify_string(str(dcm_file.StudyID)) subprocess.call( ["heudiconv", "-b", "-d", dcm_dir_template, "-o", bids_output, "-f", heuristic, "-s", bids_subject, "-ss", bids_session, "--overwrite"]) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABBPPQYFAGVGDQOHDLEYAMLRG65M7ANCNFSM4KX25NLQ> .

-- From iphone

Yingli Lu · Answer 4 · Thu Mar 12 2020 04:34:36 GMT+0800 (China Standard Time)

Hi,

Double checked cfmm2tar.py and tar2bar, Seems both works!

Tested with

./tar2bids -P "NL_31_1F9"  "Menon_Mouse_APPNL-G-F_20200210_NL_31_1F9_20200210_01.A3AD08CB.tar"

Get output:

PI=Menon Study=Mouse_APPNL-G-F Date=20200210 PatientName=NL_31_1F9_20200210

parsed patient name correctly(check BTW).

Is it possible that the problem was caused when running 'tar2bar' or by the heuristics file?

Cheers,

YingLi

BTW:

line 180:
if change patient=${patient_etc%_[0-9]*.*} to patient=${patient_etc%_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]*.*}

get

PI=Menon Study=Mouse_APPNL-G-F Date=20200210 PatientName=NL_31_1F

AlanKuurstra · Answer 5 · Thu Mar 12 2020 05:19:06 GMT+0800 (China Standard Time)

When I was getting errors, I did not directly pass the subject using -P. It seems to do better using that flag.

But it still isn't correct, since:
subject: NL_31_1F9
study_id: 20200210_01
hash of study uid: A3AD08CB

but in your example the subject is returned as NL_31_1F9_20200210

Note that some variant of 20200210_01.A3AD08CB should be interpreted as the BIDS session.

Yingli Lu · Answer 6 · Thu Mar 12 2020 20:56:29 GMT+0800 (China Standard Time)

Hi Alan,

Previous post BTW section should fix it:-)

AlanKuurstra · Answer 7 · Thu Mar 12 2020 21:01:11 GMT+0800 (China Standard Time)

Since in this situation tar2bids only works with the -P flag, I would suggest that puts the burden of parsing on the user instead of cfmm2tar.

Yingli Lu · Answer 8 · Thu Mar 12 2020 21:09:21 GMT+0800 (China Standard Time)

Yup. -P is simple, flexible and powerful. Sometimes, user is the only one who knows which part is the subject name.

Ali,

Is it okay to modify tar2bids line 180 from

patient=${patient_etc%_[0-9]*.*}

to

patient=${patient_etc%_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]*.*}

?

AlanKuurstra · Answer 9 · Thu Mar 12 2020 22:38:54 GMT+0800 (China Standard Time)

But I guess you could get the subject name from the dicom headers like what the above python code does.

I'm not sure how your whole automation system works, but Will the -P flag work with autobids?

In my opinion, the better solution would be to make a change in cfmm2tar like Ali suggested. In cfmm2tar you could open one of the dicom headers from the tar file. Then use the dicom tags (StudyDescription, AcquisitionDate, PatientName, StudyID, and the hash of the StudyInstanceUID ) to determine the structure of the tarfile name. You could then bidsify the tar filename (take out underscores etc) so that cfmm2tar will work as intended.

Yingli Lu · Answer 10 · Fri Mar 13 2020 00:24:58 GMT+0800 (China Standard Time)

Hi Alan,

The tarfile name structure was done exactly same with your solution, please check

line 74: 
https://github.com/yinglilu/cfmm2tar/blob/master/sort_rules.py

AlanKuurstra · Answer 11 · Fri Mar 13 2020 01:23:50 GMT+0800 (China Standard Time)

I used the latest commit from https://github.com/khanlab/cfmm2tar. An example tar output is:

Menon_Mouse_APPNL-G-F_20200210_NL_31_1F9_20200210_01.A3AD08CB.tar

where
study description: Menon^Mouse_APPNL-G-F
date: 20200210
subject: NL_31_1F9
study_id: 20200210_01
hash of study uid: A3AD08CB

which shows that the tar filename has not correctly been made bids friendly and does not work with tar2bids. If we've decided to keep tar2bids as is, perhaps I should move this issue to that repo.

Ali Khan · Answer 12 · Fri Mar 13 2020 01:28:39 GMT+0800 (China Standard Time)

Hi guys,

Sorry have been away from this and just trying to follow along now -- if I understand correctly, cfmm2tar is using the same dicom tags to build the tar file (whether Bruker or not), but it is just that the Bruker tags include some extra substrings that make the tar2bids parsing not possible, unless the -P flag is used?

As for changing the structure of the tar file to bids-ify it, that would be a change that would break compatibility with previously generated tar files, so less inclined to make a change that breaks all the 3T and 7T data out there already.. But is there a change we can make to how the Bruker tar files are created (in cfmm2tar) so that at least tar2bids can work in a similar fashion, without requiring the -P?

Yingli Lu · Answer 13 · Fri Mar 13 2020 01:50:38 GMT+0800 (China Standard Time)

Hi Ali,

Yes. You are right.

I am looking at the code(cfmm2tar) and trying to find a simple solution.

yl

Ali Khan · Answer 14 · Fri Mar 13 2020 01:51:38 GMT+0800 (China Standard Time)

Yingli,
Just chatted with Alan -- have some ideas for a solution perhaps we can discuss when we meet tomorrow.

Yingli Lu · Answer 15 · Fri Mar 13 2020 01:56:50 GMT+0800 (China Standard Time)

Awesome! see you guys tomorrow.

Ali Khan · Answer 16 · Sat Mar 14 2020 03:02:57 GMT+0800 (China Standard Time)

@AlanKuurstra, the parsing for subject should be fixed in docker://khanlab/tar2bids:latest now, (thanks @yinglilu) let me know when you get a chance to try it out

Ali Khan · Answer 17 · Fri Aug 20 2021 20:56:57 GMT+0800 (China Standard Time)

Closing now as I think this is fixed, but @AlanKuurstra feel free to re-open if not..