pni-lab / RLBF

Reinforcement Learning via Brain Feedback (RLBF)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Towards generative AI-based fMRI paradigms: reinforcement learning via real-time brain feedback

Giuseppe Gallitto1,2, Robert Englert2,3, Balint Kincses1,2, Raviteja Kotikalapudi1,2, Jialin Li1,2,4, Kevin Hoffshlag1,2, Sulin Ali1,2, Ulrike Bingel1,2, Tamas Spisak2,3

1 Department of Neurology, University Medicine Essen, Germany
2 Center for Translational Neuro- and Behavioral Sciences (C-TNBS), University Medicine Essen, Germany
3 Department of Diagnostic and Interventional Radiology and Neuroradiology, University Medicine Essen, Germany
4 Max Planck School of Cognition, Leipzig, Germany

Introduction

In traditional human neuroimaging experiments, researchers build experimental paradigms with a certain psychological/behavioral validity to infer the corresponding neural correlates. Here we introduce a novel approach called Reinforcement Learning via Brain Feedback (RLBF), that inverts the direction of inference; it searches for the optimal stimulation or paradigm to maximize (or minimize) response in predefined brain regions or networks (fig.1). The stimulation/ paradigm is optimized by reinforcement learning algorithm (Kaelbling et al., 1996) which is rewarded based on real-time fMRI (Sulzer et al., 2013) data. Specifically, during ongoing real-time fMRI acquisition, the reinforcement learning agent manipulates the paradigm space (e.g. by means of a generative AI model) to drive the participant’s neural activity in a specific direction. Then the agent is rewarded based on the measured brain responses and gradually learns to adjust its choices to converge towards an optimal solution. Here, we present the results of a proof of concept study that aimed to confirm the viability of the proposed approach with simulated and empirical real time fMRI data.

Fig.1: RLBF concept illustration. Brain activity is fed to a reinforcement learning algorithm that find the right paradigm to drive neural activity in the right direction (up- or down-regulation). The loop continues until the agent converges to an optimal solution.

Methods

In our proof of concept study, we aimed to construct a streamlined setup. To implement the reinforcement learner (fig 1. “Reinforcement Learning”), we used a simple and widely used algorithm, a soft Q-learner (Haarnoja et al., 2017) with a smooth reward function. For the paradigm space (fig 1. “Paradigm Generator”), we employed simple, algorithmically constructed visual stimulations. Specifically, we presented various versions of a flickering checkerboard to the participants, with contrast and frequency considered as free parameters of the paradigm space. A contrast value of zero resulted in no visual stimulation. The reward signal for the reinforcement learner was calculated from brain responses in the primary visual cortex, as measured by a linear model fitted on a single block of data measured in block-design fashion, with 5 seconds of visual stimulus followed by 11 seconds of resting state. The hypothesis function was convolved with a conventional double-gamma HRF.
In this setting, the task for the agent was to figure out the optimal contrast-frequency configuration that maximizes a participant’s brain activity in the primary visual cortex. First we tested the feasibility of our approach by running simulations with realistic effect sizes estimates. Specifically, we defined the optimal ground truth as a linear function of contrast and flickering frequency, with maximum activation with maximal contrast and a frequency of 7Hz. In one simulation run, the reinforcement learner had 100 trials. In each trial the agent picked a contrast and frequency value and updated its Q-table based on the reward that was calculated by our ground truth equation, with added Gaussian noise. We fine-tuned the hyperparameters for the models using realistic initial conditions (signal-to-noise: 0.5 - 5; q-table smoothness: 0.5 - 4.0; soft-Q temperature: 0.2; learning rate: 0.05 - 0.9).
With parameters chosen based on our simulation results, we measured data in n=10 participants, to establish the proof of concept. In the empirical measurements, we presented the checkerboard in 35-45 blocks/trials with a TR of 1 second (1 block: 5 sec stimulation, 11 seconds of rest, total scanning time 10 minutes) and allowed the reinforcement learner to optimize the visual stimulation based on brain feedback.

Fig.2: Outcomes of the simulation across 100 trials. Plots of the Q value under the ideal conditions (frequency: 0.7, contrast: 1.0) and with ideal hyperparameters (in bold, on the right) are displayed. An SNR higher than two indicates greater model performance.

Results

Simulation results (fig.3a) show that the proposed implementation provides a robust solution in a relatively wide range of initial conditions, within a small amount of trials. High smoothing power appears to function well with higher SNRs, whereas lower SNRs seem to require lower learning rates for optimal training. Nevertheless, the model displayed a remarkable stability with a wide range of learning rate values. Results from the empirical measurements (fig.3b) are in line with knowledge about the contrast and frequency dependence of the checkerboard-response (Victor et al., 1997) and provide initial confirmation for the feasibility of the proposed approach.

Fig.3: Results of the simulation (a) and the mean result for all scanned participants (b). Both the simulation and the real-time sessions have been run with the optimal hyperparameters. For the simulation we selected an SNR of 3.3.

Conclusion

Here we presented a proof of concept for Reinforcement Learning with Brain Feedback (RLBF), a novel experimental approach, which aims to find the optimal stimulation paradigm to activate, deactivate or modulate individual brain activity in pre-defined regions/networks. While this proof of concept study employed a simplified setup, future work aims to extend the approach with paradigm spaces constructed by generative AI solutions (e.g. large language models, image, video or music generation). The promises of the approach are twofold. First by inverting the direction of inference (“brain -> behavior”; instead of “behavior -> brain”) the proposed approach may emerge as a novel tool for basic and translational research. Second, when paired with generative AI, the RLBF approach has the potential to provide novel individualized treatment approaches, e.g. in from of AI- generated text, video or music that is optimized e.g. for improving states of pain or anxiety.

References

1 Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of artificial intelligence research, 4, 237-285.
2 Sulzer, J., Haller, S., Scharnowski, F., Weiskopf, N., Birbaumer, N., Blefari, M. L., ... & Sitaram, R. (2013). Real-time fMRI neurofeedback: progress and challenges. Neuroimage, 76, 386-399.
3 Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017, July). Reinforcement learning with deep energy-based policies. In International conference on machine learning (pp. 1352-1361) PMLR.
4 Victor JD, Conte MM, Purpura KP. Dynamic shifts of the contrast-response function. Visual neuroscience. 1997 May;14(3):577-87.

Software

N.B. Our real-time fMRI software is still in an early stage of development, and it is not suitable for general use. The current version has been tested on one single scanner and works only with a very specific setup.

** Tested with Siemens Magnetom Vida 3T **

Features

The current version of the program consists in:

  • A controller that manages incoming volumes from Siemens' real-time export function. It handles preprocessing and reinforcement learning with a minimum TR of 1 sec.
  • A custom RL environment made in Raylib 5.0 with a flickering checkerboard that changes in contrast and frequency.
  • A custom RL Soft-Q-Learning algorithm based on the work of Haarnoja et al., 2017.
  • A Dashboard made in Streamlit, to visualize the progress of real-time processing.

Dependencies

The program has been tested in Ubuntu 20.04.

  • Developed using Python 3.11.7.
  • Preprocessing strongly depends on ANTsPy, except for motion correction that is done using FSL mcflirt.
  • The environment runs using the python version of Raylib.
  • The Dashboard runs on Streamlit 1.30.0.

Requirements

  1. To avoid slowdowns that hinder the correct rendering of visual stimuli presented to participants a dedicated graphic card is required to run the Raylib environment.

  2. Also you should set up a "/mnt/fmritemp" folder to store temporary data. Ideally the folder should be a ram disk of at least 1GB size.

Run the program

Remember to change the paths on the "rtfmri_dashboard/controller.py" and "rtfmri_dashboard/envs/render.py" scripts before running.

Run the controller script to start the main program. The environment will spawn by itself after the reference volume has been preprocessed.

python ./rtfmri_dashboard/controller.py

Run the Dashboard. The Dashboard is only for visualization purposes and doesn't need to be run for the controller to work properly.

streamlit run ./rtfmri_dashboard/real_time/dashboard.py

About

Reinforcement Learning via Brain Feedback (RLBF)


Languages

Language:Python 100.0%