khanlab / tar2bids

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tar2bids gets subject name incorrect

AlanKuurstra opened this issue · comments

tar2bids determines that the date of the scan is my subject name.

the error occurs on:
Menon_Mouse_APPNL-G-F_20200210_NL_31_1F9_20200210_01.A3AD08CB.tar

where
study description: Menon^Mouse_APPNL-G-F
date: 20200210
subject: NL_31_1F9
study_id: 20200210_01
hash of study uid: A3AD08CB

and the bids output is:
bids/sub-20200210/anat/sub-20200210_part-mag_echo_run-01_GRE.nii.gz

It might be worth getting info from the dicom header instead of an assumed folder or filename convention.

sorry for the delay -- wonder if this is best addressed by making a fix in cfmm2tar for the bruker data? What do you think @yinglilu and @AlanKuurstra ?

Yeah, it makes sense to bidsify the tags there.
I did something similar in python. Not sure if it will help, but here's the code:

import os, pydicom, subprocess, shutil
from glob import glob


def bidsify_string(string_to_bidsify):
    return string_to_bidsify.replace('_', '')


dicom_root = '/home/akuurstr/Desktop/Esmin_mouse_registration/mouse_scans/dicoms'
heuristic = '/softdev/akuurstr/python/modules/mouse_resting_state/cfmm_bruker_mouse_heudiconv_heuristic.py'
dcm_dir_template = os.path.join(dicom_root, '*/*/*/{subject}/{session}.*/*/*.dcm')
bids_output = '/home/akuurstr/Desktop/Esmin_mouse_registration/mouse_scans/bids2'

# remove underscores from patient names (used for BIDS subjects)
subject_folders = os.path.join(dcm_dir_template.split('{subject}')[0].replace('{session}','*'),'*')
for subject_folder in glob(subject_folders):
    shutil.move(subject_folder,
                os.path.join(os.path.dirname(subject_folder), bidsify_string(os.path.basename(subject_folder))))
# remove underscores from StudyIDs (used for BIDS session)
session_folders = os.path.join(dcm_dir_template.split('{session}')[0].replace('{subject}','*'),'*')
for session_folder in glob(session_folders):
    shutil.move(session_folder,
                os.path.join(os.path.dirname(session_folder), bidsify_string(os.path.basename(session_folder))))

completed_patient_sessions = []
for root, dirs, files in os.walk(dicom_root):
    for file in files:
        if file.endswith(".dcm"):
            dcm_file = pydicom.read_file(os.path.join(root, file), stop_before_pixels=True)
            if 'rsFMRI' in dcm_file.ProtocolName:
                bids_subject = bidsify_string(str(dcm_file.PatientName))
                bids_session = bidsify_string(str(dcm_file.StudyID))
                if (bids_subject,bids_session) in completed_patient_sessions:
                    continue
                subprocess.call(
                    ["heudiconv", "-b", "-d", dcm_dir_template, "-o", bids_output, "-f", heuristic, "-s", bids_subject,
                     "-ss", bids_session, "--overwrite"])
                completed_patient_sessions.append((bids_subject,bids_session))

Hi,

Double checked cfmm2tar.py and tar2bar, Seems both works!

Tested with

./tar2bids -P "NL_31_1F9"  "Menon_Mouse_APPNL-G-F_20200210_NL_31_1F9_20200210_01.A3AD08CB.tar"

Get output:

PI=Menon Study=Mouse_APPNL-G-F Date=20200210 PatientName=NL_31_1F9_20200210

parsed patient name correctly(check BTW).

Is it possible that the problem was caused when running 'tar2bar' or by the heuristics file?

Cheers,

YingLi

BTW:

line 180:
if change patient=${patient_etc%_[0-9]*.*} to patient=${patient_etc%_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]*.*}

get

PI=Menon Study=Mouse_APPNL-G-F Date=20200210 PatientName=NL_31_1F

When I was getting errors, I did not directly pass the subject using -P. It seems to do better using that flag.

But it still isn't correct, since:
subject: NL_31_1F9
study_id: 20200210_01
hash of study uid: A3AD08CB

but in your example the subject is returned as NL_31_1F9_20200210

Note that some variant of 20200210_01.A3AD08CB should be interpreted as the BIDS session.

Hi Alan,

Previous post BTW section should fix it:-)

Since in this situation tar2bids only works with the -P flag, I would suggest that puts the burden of parsing on the user instead of cfmm2tar.

Yup. -P is simple, flexible and powerful. Sometimes, user is the only one who knows which part is the subject name.

Ali,

Is it okay to modify tar2bids line 180 from

patient=${patient_etc%_[0-9]*.*}

to

patient=${patient_etc%_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]*.*}

?

But I guess you could get the subject name from the dicom headers like what the above python code does.

I'm not sure how your whole automation system works, but Will the -P flag work with autobids?

In my opinion, the better solution would be to make a change in cfmm2tar like Ali suggested. In cfmm2tar you could open one of the dicom headers from the tar file. Then use the dicom tags (StudyDescription, AcquisitionDate, PatientName, StudyID, and the hash of the StudyInstanceUID ) to determine the structure of the tarfile name. You could then bidsify the tar filename (take out underscores etc) so that cfmm2tar will work as intended.

Hi Alan,

The tarfile name structure was done exactly same with your solution, please check

line 74: 
https://github.com/yinglilu/cfmm2tar/blob/master/sort_rules.py

I used the latest commit from https://github.com/khanlab/cfmm2tar. An example tar output is:

Menon_Mouse_APPNL-G-F_20200210_NL_31_1F9_20200210_01.A3AD08CB.tar

where
study description: Menon^Mouse_APPNL-G-F
date: 20200210
subject: NL_31_1F9
study_id: 20200210_01
hash of study uid: A3AD08CB

which shows that the tar filename has not correctly been made bids friendly and does not work with tar2bids. If we've decided to keep tar2bids as is, perhaps I should move this issue to that repo.

Hi guys,

Sorry have been away from this and just trying to follow along now -- if I understand correctly, cfmm2tar is using the same dicom tags to build the tar file (whether Bruker or not), but it is just that the Bruker tags include some extra substrings that make the tar2bids parsing not possible, unless the -P flag is used?

As for changing the structure of the tar file to bids-ify it, that would be a change that would break compatibility with previously generated tar files, so less inclined to make a change that breaks all the 3T and 7T data out there already.. But is there a change we can make to how the Bruker tar files are created (in cfmm2tar) so that at least tar2bids can work in a similar fashion, without requiring the -P?

Hi Ali,

Yes. You are right.

I am looking at the code(cfmm2tar) and trying to find a simple solution.

yl

Yingli,
Just chatted with Alan -- have some ideas for a solution perhaps we can discuss when we meet tomorrow.

Awesome! see you guys tomorrow.

@AlanKuurstra, the parsing for subject should be fixed in docker://khanlab/tar2bids:latest now, (thanks @yinglilu) let me know when you get a chance to try it out

Closing now as I think this is fixed, but @AlanKuurstra feel free to re-open if not..