[Feature Request] Input Additional Metadata via Spreadsheet

Question

[Feature Request] Input Additional Metadata via Spreadsheet

bendhouseart opened this issue a year ago · comments

It appears that having the ability to:

Upload (BIDS formatted see this link for an example of PET) spreadsheet data
Match uploaded spreadsheets to imaging/session data
Direct matched data to BIDS output (sidecar nifti's for starters)

would be a useful feature addition to ezBIDS for PET and ASL as well as other modalities. Adding this as an issue to continue the discussion @dlevitas and @bendhouseart started in slack.

Dan Levitas · Answer 1 · Thu May 04 2023 03:07:11 GMT+0800 (China Standard Time)

Additionally, the main purpose of this request is to create an efficient workflow that allows users to add information to their dataset's json sidecars that are required by BIDS but not contained in the DICOM headers, meaning dcm2niix cannot extract this information and place it in the sidecars. Uploaded spreadsheet(s) will contain columns for each sidecar field that's required by BIDS for the specific sequence, with PET and ASL primarily in mind. The goal will be to map these spreadsheets to their corresponding json sidecars and insert the necessary information in order to pass BIDS validation.

In theory, this should roughly follow the Events workflow.

Issue #66 will likely benefit from this feature.

Anthony Galassi · Answer 2 · Fri May 12 2023 03:44:18 GMT+0800 (China Standard Time)

Following up here with -> openneuropet/PET2BIDS#210

Would you prefer that I generalize the following functions from pet2bids.helper_functions so that they work equally well with ASL? All that I do is ingest spreadsheets and their data and output python dict/json compliant data from therein with them.

It might save you a bit of trouble as you could just import them into ezBIDS instead of rolling your own.

def flatten_series(series):
    """
    This function retrieves either a list or a single value from a pandas series object thus converting a complex
    data type to a simple datatype or list of simple types. If the length of the series is one or less this returns that
    single value, else this object returns all values within the series that are not Null/nan in the form of a list
    :param series: input series of type pandas.Series object, typically extracted as a column/row from a
    pandas.Dataframe object
    :return: a simplified single value or list of values
    """
    simplified_series_object = series.dropna().to_list()
    if len(simplified_series_object) > 1:
        pass
    elif len(simplified_series_object) == 1:
        simplified_series_object = simplified_series_object[0]
    else:
        raise f"Invalid Series: {series}"
    return simplified_series_object


def collect_spreadsheets(folder_path: pathlib.Path):
    spreadsheet_files = []
    all_files = [folder_path / pathlib.Path(file) for file in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, file))]
    for file in all_files:
        if file.suffix == '.xlsx' or file.suffix == '.csv' or file.suffix == '.xls' or file.suffix == '.tsv':
            spreadsheet_files.append(file)
    return spreadsheet_files


def single_spreadsheet_reader(
        path_to_spreadsheet: Union[str, pathlib.Path],
        pet2bids_metadata_json: Union[str, pathlib.Path] = pet_metadata_json,
        dicom_metadata={},
        **kwargs) -> dict:

    metadata = {}

    if type(path_to_spreadsheet) is str:
        path_to_spreadsheet = pathlib.Path(path_to_spreadsheet)

    if path_to_spreadsheet.is_file():
        pass
    else:
        raise FileNotFoundError(f"{path_to_spreadsheet} does not exist.")

    if pet2bids_metadata_json:
        if type(pet_metadata_json) is str:
            pet2bids_metadata_json = pathlib.Path(pet2bids_metadata_json)

        if pet2bids_metadata_json.is_file():
            with open(pet_metadata_json, 'r') as infile:
                metadata_fields = json.load(infile)
        else:
            raise FileNotFoundError(f"Required metadata file not found at {pet_metadata_json}, check to see if this file exists;"
                        f"\nelse pass path to file formatted to this {permalink_pet_metadata_json} via "
                        f"pet2bids_metadata_json argument in simplest_spreadsheet_reader call.")
    else:
        raise FileNotFoundError(f"pet2bids_metadata_json input required for function call, you provided {pet2bids_metadata_json}")

    spreadsheet_dataframe = open_meta_data(path_to_spreadsheet)

    # collect mandatory fields
    for field_level in metadata_fields.keys():
        for field in metadata_fields[field_level]:
            series = spreadsheet_dataframe.get(field, Series(dtype=numpy.float64))
            if not series.empty:
                metadata[field] = flatten_series(series)
            elif series.empty and field_level == 'mandatory' and not dicom_metadata.get(field, None) and field not in kwargs:
                logging.warning(f"{field} not found in {path_to_spreadsheet}, {field} is required by BIDS")

    # lastly apply any kwargs to the metadata
    metadata.update(**kwargs)

    return metadata

I'm most of the way there for the listed issues in this Feature Request barring a bit of testing, so let me know if any of the above would be helpful.

Dan Levitas · Answer 3 · Fri May 12 2023 06:05:29 GMT+0800 (China Standard Time)

Would you prefer that I generalize the following functions from pet2bids.helper_functions so that they work equally well with ASL? All that I do is ingest spreadsheets and their data and output python dict/json compliant data from therein with them.

Yeah, if that's not too much, it would be very helpful!