Prob Decoding

We mainly study decoding (and a bit of encoding) behavior from electrophysiological recordings using Neuropixel arrays from the International Brain Lab (IBL) dataset. We primarily apply a probabilistic machine learning approach.

Getting Started

If you need to run src/save_data.ipynb, you will need to install everything in requirements.txt. If you are just loading the data from the .npy files, all you need is numpy and whatever libraries you need for data analysis.

Data

I had a 164 gigabyte folder on my laptop with all the openalyx.internationalbrainlab.org data. I have since deleted this folder but we might want to get it back if needed. I am not exactly sure how I downloaded all of it.

How to get the data

load_data_from_pids(pids, brain_region, behavior="choice", data_type="all_ks", n_t_bins=30, prior_path=None, t_before=0.5, t_after=1.5, normalize_input=True)

This function returns data from specific probes. A description of the important parameters is given below.

pids (str): probe IDs
brain_regions (str): brain regions to get data from. Popular options are "alv", "ca1", "cing", "dg-mo", "dg-po", "dg-sg", "eth", "fp", "lp", "ml", "or", "po", "th", "visam5", "visam6a", "vplpc". See this link for a better description of all possible regions.
behavior (str, optional): behavior that we want to get from the animals. Options are "choice", "prior", "contrast", "reward", "motion_energy", "wheel_velocity", "wheel_speed", "pupil_diameter", "paw_speed".
data_type (str, optional): Options are "all_ks", "good_ks", "thresholded". all_ks means that we take all sessions from kilosort 2.5. See this paper or maybe this paper.
n_t_bins (int, optional): number of bins the times are divided up into.
t_before (float, optional): number of seconds to include in the sample before the stimulus onset for a given trial
t_after (float, optional): number of seconds to include in the sample after the stimulus onset for a given trial
normalize_input (bool, optional): If normalized, we get the data as real numbers, if not normalized, we get whole numbers.

Our data choices:

For src/raw_data/small_data:

This is data pulled only from pid dab512bd-a02d-4c1f-8dbc-9155a163efc0. It contains 367 sessions, each recording from 24 neurons classified as good by Kilosort, over 40 time bins. This data is analyzed in Berkan"s homework 3.

pids: dab512bd-a02d-4c1f-8dbc-9155a163efc0
brain_regions: po
behavior: wheel_speed
data_type: good_ks
n_t_bins: 40
t_before: .5
t_after: 1.5
normalize_input: False

For src/raw_data/full_data:

Make sure to unzip full_data.zip into src/raw_data/full_data. Get full_data.zip from here.

This data is pulled from all pids in src/raw_data/full_data/all_pids.csv Unfortunately, not all pids were able to be loaded, so only 112 pids are actually used. The 112 that are used are described in src/raw_data/full_data/names_and_shapes.txt.

pids: see src/raw_data/full_data/names_and_shapes.txt
brain_regions: po
behavior: wheel_speed
data_type: all_ks (note, this is different from small_data)
n_t_bins: 40
t_before: .5
t_after: 1.5
normalize_input: False

berkott / prob_decoding

Prob Decoding

Getting Started

Data

How to get the data

Our data choices:

About

Languages