OpenNeuroDatasets / ds004718

OpenNeuro dataset - Le Petit Prince Hong Kong: Naturalistic fMRI and EEG dataset from older Cantonese speakers

Home Page:https://openneuro.org/datasets/ds004718

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

## Overview
In the field of neurobiology of language, existing research predominantly focuses on data from a limited number of Indo-European languages and primarily involves younger adults, overlooking other age groups. This experiment aims to address these gaps by creating a comprehensive multimodal database. The primary goal is to advance our understanding of language processing in older adults and the impact of healthy aging on brain-behavior relationships. 

The experiment involves collecting task-based and resting-state fMRI, structural MRI, and EEG data from 52 healthy right-handed older Cantonese participants over 65 years old as they listen to excerpts from “The Little Prince” in Cantonese. Additionally, the database includes detailed information on participants’ language history, lifetime experiences, linguistic and cognitive skills, as well as extensive 
audio and text annotations, such as time-aligned speech segmentation and prosodic features, along with word-by-word predictors from natural language processing (NLP) tools. Quality diagnostics of the MRI and EEG data confirm their robustness, positioning this database as a valuable resource for studying the spatiotemporal dynamics of language comprehension in older adults.

## Methods
### Participants
We recruited 52 healthy, right-handed older Cantonese participants (40 females, mean age=69.12, SD=3.52) from Hong Kong for the experiment, which consists of an fMRI and an EEG session. In both sessions, participants listened to the same sections of The Little Prince in Cantonese for approximately 20 minutes. We made sure each participant was right-handed and a native Cantonese speaker using the Language History Questionnaire8 (LHQ3). Additionally, participants reported normal or corrected normal hearing. They confirmed they had no cognitive decline. Two participants did not take part in the fMRI session and an additional 4 participants’ fMRI data were removed due to excessive head movement, resulting in a total of 46 participants (39 females, mean age=69.08yrs, SD=3.58) for the fMRI session and 52 participants (40 females, mean age=69.12yrs, SD=3.52) for the EEG session. Prior to the experiment, all participants were provided with written informed consent. All participants received monetary compensation after each session. Ethical approval was obtained from the Human Subjects Ethics Application Committee at the Hong Kong Polytechnic University (application number HSEARS20210302001). This study was performed in accordance with the Declaration of Helsinki and all other regulations set by the Ethics Committee.

### Experiment Procedures
The study consisted of an fMRI session and an EEG session. The order of the EEG and fMRI sessions was counterbalanced across all participants, and a minimum two-week interval was maintained between sessions. 

#### fMRI experiment
Before the scanning day, an MRI safety screening form was sent to the participants to make sure MRI scanning was safe for them. We also sent them simple readings and videos about MRI scanning so that they could have an idea of what it would be like to be in a scanner. On the day of scanning, participants were initially introduced to the MRI facility and comfortably positioned inside the scanner, with their heads securely supported using paddings. An MRI-safe headphone (Sinorad package) was provided for participants to wear inside the head coil. The audio volume for the listening task was adjusted to ensure audibility for each participant. A mirror attached to the head coil allowed participants to view the stimuli presented on a screen. Participants were instructed to stay focused on the visual fixation sign while listening to the audiobook. The scanning session commenced with the acquisition of structural (T1-weighted) scans. Subsequently, participants engaged in the listening task concurrently with fMRI scanning. The task-based fMRI experiment was divided into four runs, each corresponding to a section of the audiobook. Comprehension was assessed by a series of 5 yes/no questions (20 questions in total) on the content they had listened to. These questions were presented on the screen, with participants indicating their answers by pressing a button. The session concluded with the collection of resting-state fMRI data. 
 
#### Cognitive tasks
Four cognitive tasks were selected to assess participants’ cognitive abilities in various domains, including the forward digit span task, picture naming task, verbal fluency task, and Flanker task. These tasks were delivered after the fMRI session in a separate soundproof 
booth.
 
#### EEG experiment
During the EEG experiment, participants were seated comfortably in a quiet room and standard procedures were followed for electrode placement and EEG cap preparation. Participants were instructed to focus on a fixation sign displayed on a monitor. The EEG recording was then initiated, with participants listening to the audiobook. The audio volume was adapted to each participant’s hearing ability before the recording using a different set of stimuli. We used Foam Ear Inserts (Medium 14mm). Similar to the fMRI experiment, participants listened to four sections of the audiobook, each lasting approximately 5 minutes. After each run, participants were asked to answer a total of 20 yes/no questions, with 5 questions assigned to each run. They indicated their answers by pressing a button. The EEG recording was conducted continuously throughout all four runs until their completion.

#### Questionnaires. 
We administered LHQ3 and the Lifetime of Experiences Questionnaire (LEQ) during EEG cap preparation. The participants did not need to move or fill in these questionnaires themselves; a research assistant asked the questions one by one in Cantonese and input the responses in an online Google form. LHQ is designed to document language history by producing aggregate scores for language proficiency, exposure, and dominance in all the languages spoken by the participants. LEQ is a tool to document what sorts of activities (e.g. sports, music, education, profession, etc) participants engage in over their lifetime. It measures lifetime experiences in three periods of life: from 13 to 30 (young adulthood), from 30 to 65 (midlife), and after 65 (late life). LEQ produces a total score (see participants.tsv) which is an indication of cognitive activity. Collecting data using these two questionnaires allowed us to have a thicker description of our participants’ linguistic, social, and cognitive experiences. 

### Acquisition
The MRI data were collected at the University Research Facility in Behavioral and Systems Neuroscience (UBSN) at The Hong Kong Polytechnic University. EEG data was collected at the Speech and Language Sciences Laboratory within the Department of Chinese and Bilingual Studies at the same university. Data acquisition for this project started in July 2021 and ended in December 2022.

#### fMRI data. 
MRI imaging data were acquired using a 3T Siemens MAGNETOM Prisma system MRI scanner with a 20-channel coil. Structural MRI was acquired for each participant using a T1-weighted sequence with the following parameters: repetition time (TR) = 2,500 ms, echo time (TE) = 2.22 ms, inversion time (TI) = 1,120 ms, flip angle α (FA) = 8°, field of view (FOV) = 240 × 256 × 167 mm, resolution = 0.8 mm isotropic, acquisition time = 4 min and 32s. The acquisition parameters for echo planar T2-weighted imaging (EPI) were as follows: 60 oblique axial slices, TR = 2000 ms, TE = 22 ms, FA= 80°, FOV = 204 × 204 × 165 mm, 2.5 mm isotropic, and acceleration factor 3. E-Prime 2.0 (Psychology Software Tools) was used to present the stimuli.
 
#### EEG data. 
A gel-based 64-channel Neuroscan system on a 10-20 electrode template was used for data acquisition, sampling at a rate of 1000 Hz. To mark the onset of each sentence, triggers were set at the beginning of each sentence. STIM2 software (Compumedics Neuroscan) was used for stimulus presentation.

### Stimuli
The experimental stimuli utilized in both the EEG and fMRI consisted of approximately 20 minutes of the story The Little Prince in Cantonese audiobook. It was translated and narrated in Cantonese by a native male speaker. The stimuli consist of a total of 4,473 words and 535 sentences. To facilitate data analysis and participant engagement, the stimuli were further segmented into four distinct sections, each spanning nearly five minutes. To assess listening comprehension, participants were presented with five yes/no questions after completing each section, resulting in a total of 20 questions throughout the experiment. To make sure the speed of story narration was normal for the participants, we asked a few people who were different from the participants in this study to judge the speed and comprehensibility. They all reported the speed was normal, neither so slow nor so fast. 

### Annotation
We present audio and text annotations, including time-aligned speech segmentation and prosodic information, as well as word-by-word predictors derived from natural language processing (NLP) tools. These predictors include aspects of lexical semantic information, such as part-of-speech (POS) tagging and word frequency.

#### Prosodic information. 
We extracted the root mean square intensity and the fundamental frequency (f0) from every 10 ms interval of the audio segments by utilizing the Voicebox toolbox (http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html). Peak RMS intensity and peak f0 for each word in the naturalistic stimuli were used to represent the intensity and pitch information for each word.

#### Word frequency. 
Word segmentation was performed manually by two native Cantonese speakers. The log-transformed frequency of each word was also estimated using PyCantonese20, Version 3.4.0 (https://pycantonese.org/). The built-in corpus in PyCantonese is the Hong Kong
Cantonese Corpus21 (HKCancor), collected from transcribed conversations between March 1997 and August 1998. 

#### Part-of-speech tagging. 
Part-of-speech (POS) tagging for each word in the stimuli was extracted using the PyCantonese20, Version 3.4.0 (https://pycantonese.org/). Following the manual segmentation of words, we input these segments into the Cantonese-exclusive NLP tool PyCantonese, which then provided POS tags for each word according to the Universal Dependencies v2 tagset22 (UDv2). 

### Preprocessing
All MRI data were preprocessed using the NeuroScholar cloud platform (http://www.humanbrain.cn, Beijing Intelligent Brain Cloud, Inc.), provided by The Hong Kong Polytechnic University. This platform uses an enhanced pipeline based on fMRIPrep 20.2.6  (RRID: SCR_016216) and supported by Nipype 1.7.0 (RRID: SCR_002502). Then we used the pydeface (https://github.com/poldracklab/pydeface) package to remove the voxels corresponding to the faces from both anatomical and preprocessed data to anonymize participants’ facial information.

#### Anatomical MRI. 
The structural MRI data underwent intensity non-uniformity correction, skull-stripping, and brain tissue segmentation of cerebrospinal fluid (CSF), white matter (WM), and gray matter (GM) based on the reference T1w image. The resulting anatomical images were nonlinearly aligned to the ICBM 152 Nonlinear Asymmetrical template version 2009c (MNI152NLin2009cAsym) template brain. Radiological reviews were performed on MRI images by a medical specialist in the lab. Incidental findings were noticed for participants sub-HK031 and sub-HK049. There was a sub-centimeter (0.7cm) blooming artefact in the right putamen, likely a cavernoma for participant sub-HK031. For participant sub-HK049, there was a left thalamic (0.7cm) oval-shaped susceptibility artefact and a 2.6 cm cystic collection in the right posterior fossa.
 
#### Functional MRI. 
The preprocessing of both resting and functional MRI data included the following steps: (1) skull-stripping, (2) slice-timing correction with the temporal realignment of slices according to the reference slice, (3) BOLD time-series co-registration to the T1w reference image, (4) head-motion estimations and spatial realignment to adjust for linear head motion, (5) applying parameters from structural images to spatially normalize functional images into Montreal Neurological Institute (MNI) template, and (6) smoothing by a 6mm FWHM (full-width half-maximum) Gaussian kernel.

#### EEG. 
The pre-processing was carried out using EEGLAB and in-house MATLAB functions. The preprocessing of EEG data included the following steps: (1) a cutoff frequency filter with 1 Hz high pass and 40.0 Hz low pass cut-off was applied followed by a notch filter at 50 Hz to reduce electrical line noise, (2) use of kurtosis measure to identify and remove bad channels, (3) application of the RUNICA algorithm (from EEGLab toolbox, 2023 version), a machine learning algorithm that evaluates ICA-derived components, for automated 
rejection of artifacts, including signal noise from eye and muscle, high-amplitude artifacts (e.g., blinks), and signal discontinuities (e.g., electrodes losing contact with the scalp), (4) interpolating data for bad channels using spherical splines for each segment. (5) re-referencing the data by using both electrodes M1 and M2 as the reference for all channels and (6) down-sampling all the data to 250 Hz. 

### Dataset Structure
#### Participant responses
1. Location: participants.json, participants.tsv
2. File format: tab-separated value
3. Participants’ sex, age, and accuracy of quiz questions for each fMRI and EEG experiment, scan number and LEQ scores in tab-separated value (tsv) files. Data is structured as one line per participant.

#### Audio files 
1. Location: stimuli/task-lppHK_run-1[2-4].wav
2. File format: wav
3. The 4-section audiobook from The Little Prince in Cantonese

#### Anatomical data files 
1. Location: sub-HK<ID>/anat/sub-HK<ID>_T1w.nii.gz
2. File format: NIfTI, gzip-compressed
3. The raw high-resolution anatomical image after defacing

#### Functional data files
1. Location: sub-HK<ID>/func/sub-HK<ID>_task-lppHK_run-1[2–4]_bold.nii.gz
2. File format: NIfTI, gzip-compressed.
3. Sequence protocol: sub-HK<ID>/func/sub-HK<ID>_task-lppHK_run-1[2–4]_bold.json.
4. The preprocessed data are also available as:derivatives/sub-HK<ID>/func/sub-HK<ID>_task-lppHK_run-1[2–4]_desc-preprocessed_bold.nii.gz

#### Resting-state MRI data files
1. Location: sub-HK<ID>/func/sub-HK<ID>_task-rest_bold.nii.gz
2. File format: NIfTI, gzip-compressed
3. Sequence protocol: sub-HK<ID>/func/sub-HK<ID>_task-rest_bold.json.
4. The preprocessed data are also available as: derivatives/sub-HK<ID>/func/sub-HK<ID>_rest_bold.nii.gz

#### EEG data files
1. Location: sub-HK<ID>/eeg/sub-HK<ID>_task-lppHK_eeg.set
2. File format: set (a type of MATLAB file, with a file in the .fdt extension containing raw data)
3. The preprocessed data are also available as: derivatives/sub-HK<ID>/eeg/sub-HK<ID>_task-lppHK_eeg.set (together with a file in the .fdt extension containing raw data)

#### Annotations
1. Location: annotation/snts.txt, annotation/lppHK_word_information.txt, annotation/wav_acoustic.csv
2. File format: comma-separated value
3. Annotation of speech and linguistic features for the audio and text of the stimuli

#### Quiz questions
1. Location: quiz/lppHK_quiz_questions.csv
2. File format: comma-separated value
3. The 20 yes/no quiz questions were employed in both the fMRI and EEG experiments

## Usage Note
If you want to know more about the dataset, please refer to our paper "Le Petit Prince Hong Kong (LPPHK): Naturalistic fMRI and EEG Data from Older Cantonese Speakers", https://doi.org/10.1101/2024.04.24.590842

This dataset is still under maintenance.

## Contact
For any question regarding this data, please contact:
1. Dr. Mohammad Momenian, mohammad.momenian@polyu.edu.hk
2. Ms. Zhengwu Ma, zhengwuma2-c@my.cityu.edu.hk
3. Ms. Shuyi Wu, shuyiwu2017@gmail.com
4. Ms. Chengcheng Wang, cwang495-c@my.cityu.edu.hk
5. Dr. Jixing Li, jixingli@cityu.edu.hk

About

OpenNeuro dataset - Le Petit Prince Hong Kong: Naturalistic fMRI and EEG dataset from older Cantonese speakers

https://openneuro.org/datasets/ds004718