Ellen Hoffman fish data aggregation

Pipeline 1: Best Recovery Drug

Goal:

Multiple drugs were applied to mutant fish. Identify the one drug that will make the fish behave the most like the wild type.

Steps:

calculate zscore based on wild type, calculate_zscore.m/calculate_zscore_burst.m
average across all the fish for each drug and geno (HOM and WT), average_after_zscore.m
make clustergram, avgz_to_clustergram.m
PCA and euclidean distance.

More Details: The following two process lines work:

Individualdata ---> calculate_zscore ---> average_after_zscore ---> avgz_to_clustergram

splitmean ---> calculate_zscore_burst ---> average_after_zscore ---> avgz_to_clustergram

This one needs more work:

splitmean ---> calculate_zscore ---> average_after_zscore ---> avgz_to_clustergram

file details

File name	Description
calculate_zscore.m	Input type 1: Jeff's pipeline (usualy named 'individualData_xxxxxxxx.csv') Output: calculate the z-score of experimental fish plate based on the mean and standard deviation of the wild type fish of the same day Input type 2: The one off burst file generated by Jeff (named 'scn1lab_rw_split_means.csv') Output: calculate the z-score of experimental fish plate based on the mean and standard deviation of the wild type fish of the same day, TO DO: removed burct and burur coloumns because there are too many infs
calculate_zscore_burst.m	Input: The one off burst file generated by Jeff (named 'scn1lab_rw_split_means.csv') Output: calculate zscore of the burst variable (burct and burur), but use wild type from all days because each single given day the fish may not have any burst activity.
average_after_zscore.m	Input: calculated z score from the above 'calculate_zscore.m' and 'calculate_z_score_burst.m' Trim the file so only HOM types are left Average across fish, also aggregate across activities Output1: mean_by_geno. The parameter/activities are averaged acrossed fish for each geno If it is not the bust file, also export the aggregated zscore: Output2: _averaged. The parameter/activities are aggregated to rms and mean for each group, bout, activity, sleep, all
avgz_to_clustergram.m	Use the mean_by_geno to generate a clustergram. Use customer color my_colormap. Replaced the underscores in the labels with space.
run_PCA_euclidean.m/td>	Input, mean by geno file. Output: graphies for publication, euclidean distance table. Run PCA, make plots for publications, and calculate euclidean distances for all the drugs and dosages
script_best_recovery_drug.m	Combining the above steps to plots and calculate which drug is the best recovery drug.

Pipeline 2: Seizure analysis (pre-post) script_pre_post_analysis.m

Experiment details:

Fish in 8 rows (96 plates total)
        PRE                                    POST
Wild Type — DMSO        Wild Type — DMSO — H2O
Wild Type — DMSO        Wild Type — DMSO — PTZ
Wild Type — MC             Wild Type — MC — H2O
Wild Type — MC             Wild Type — MC — PTZ

HOM — DMSO              HOM — DMSO — H2O
HOM — DMSO              HOM — DMSO — PTZ
HOM — MC                   HOM — MC — H2O
HOM — MC                   HOM — MC — PTZ

Hypotheses:

PTZ creates more seizure (burst count)
effect of PTZ in HOM and Wild Type(WT)
effect of MC (Can MC cancel PTZ's effect)

Analysis

Take the pre and post difference score, and do a 3-way ANOVA, geno x MC/DMSO x PTZ/H2O, look for significant 3-way interaction.
Separate HOM and WT, and do 3-way ANOVAs: pre/post x MC/DMSO x PTZ/H20 Look for significant 3-way interaction only in HOM but not in WT
post hoc t tests: compare HOM+MC+PTZ and HOM+DMSO+PTZ, they have a similar value pre, but in the post data, HOM+DMSO+PTZ has a higher value.

Procedure Summary:

From Raw score for pre and post experiments to various plots, anovas, and an intermdeidate csv file

How to run

Step 1: first make a folder that includes the following three files: pre, post, geno.
Step 2: save the pre and post files into the excel format, with the xlsx extension. (Matlab is able to easily convert excel files into tables, but not from csv files)
Step 3: run 'script_pre_post_analysis;' on the command line. You will select the three files in order: pre (the excel version), post (the excel version), geno. If it's an windows computer it would show the prompt for each type of file, but the prompts don't work in a Mac OS system. so just follow the order of pre-post-geno.
Step 4: check the outputs which will be saved in the folder that you have created in the first place!

file details

File name	Description
pre_post.m	preprocess the pre and post data files and geno text file, generate an output that can feed into the following analyses
plot_geno_by_time.m	use the output from the pre_post and make geno_by_time plots, saved in a 'plots' folder
plot_prepost.m	plot the pre post data
plot_bars.m	plot the data by all the combinations of factors
make_boxplots.m	make boxplots for all the combinations of factors including geno, drug1, drug2, pre and post
do_anova.m	do 3-way anova on the difference score (geno, drug1, drug2), HOM and WT data (pre/post, drug1, drug2)
script_pre_post_analysis.m	a script that chains oher functions. First use pre_post to create the output variable. Then use the output variable to make various plots and do anovas.

About

Ellen Hoffman Zebra fish zscore and aggregation

Languages

Language:MATLAB 84.7%Language:R 15.3%