Examine the distribution of the predicted track and validation track.
Setting up the threshold for calculating the MSE specifically for peaks.
Creating Models for each histone marks.
Data Preparation
# Reading data.ml_df=pd.read_csv("sources/ML_model/output/ml_data.csv", header=0)
# Create the shuffled dataframe for randomly selecting the folds. ml_df_shuf=ml_df.sample(frac=1)
# Define number of folds. k=4# Create index for the folds. folds_index=list(range(0, ml_df_shuf.shape[0], math.ceil(ml_df_shuf.shape[0]/k))) + [ml_df_shuf.shape[0]]
folds_index= [[folds_index[i]+1, folds_index[i+1]] foriinrange(len(folds_index)-1)]
folds_index[0][0] =0folds_index
---- Creating Validation Set for data in range [0, 303615]
-------- Start Training on 1574182987.977485
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 58.52015209197998
---- Creating Validation Set for data in range [303616, 607230]
-------- Start Training on 1574183048.340859
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 58.19038510322571
---- Creating Validation Set for data in range [607231, 910845]
-------- Start Training on 1574183108.437985
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 57.9245810508728
---- Creating Validation Set for data in range [910846, 1214460]
-------- Start Training on 1574183168.262582
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 57.9299750328064
# Calculating the MSE for each fold. MSE_dict= {}
foriinrange(4):
MSE_dict[i] = {
"predicted": round(np.mean(abs(model_dic[i]["predicted"]-model_dic[i]["test_label"])), 4),
"avocado": round(np.mean(abs(model_dic[i]["avocado"]-model_dic[i]["test_label"])), 4),
"curr_impute": round(np.mean(abs(model_dic[i]["curr_impute"]-model_dic[i]["test_label"])), 4)
}
pretty(MSE_dict)
---- Creating Validation Set for data in range [0, 303615]
-------- Start Training on 1574183999.332468
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 56.25135779380798
---- Creating Validation Set for data in range [303616, 607230]
-------- Start Training on 1574184057.5759418
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 55.562814235687256
---- Creating Validation Set for data in range [607231, 910845]
-------- Start Training on 1574184115.145239
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 63.64612078666687
---- Creating Validation Set for data in range [910846, 1214460]
-------- Start Training on 1574184181.19211
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 67.51104378700256
# Calculating the MSE for each fold. MSE_dict= {}
foriinrange(4):
MSE_dict[i] = {
"predicted": round(np.mean(abs(model_dic_id_rmd[i]["predicted"]-model_dic_id_rmd[i]["test_label"])), 4),
"avocado": round(np.mean(abs(model_dic_id_rmd[i]["avocado"]-model_dic_id_rmd[i]["test_label"])), 4),
"curr_impute": round(np.mean(abs(model_dic_id_rmd[i]["curr_impute"]-model_dic_id_rmd[i]["test_label"])), 4)
}
pretty(MSE_dict)
---- Creating Validation Set for data in range [0, 116775]
-------- Start Training on 1574188344.497136
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 17.838374853134155
---- Creating Validation Set for data in range [116776, 233550]
-------- Start Training on 1574188362.94751
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 17.883373975753784
---- Creating Validation Set for data in range [233551, 350325]
-------- Start Training on 1574188381.4168372
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 17.50961685180664
---- Creating Validation Set for data in range [350326, 467100]
-------- Start Training on 1574188399.50402
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 17.69806694984436
0
predicted
0.1315
avocado
0.1523
curr_impute
0.3132
1
predicted
0.1315
avocado
0.1493
curr_impute
0.3041
2
predicted
0.1337
avocado
0.1541
curr_impute
0.3129
3
predicted
0.1332
avocado
0.1519
curr_impute
0.3134
---- Creating Validation Set for data in range [0, 116775]
-------- Start Training on 1574188997.884465
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 16.571380853652954
---- Creating Validation Set for data in range [116776, 233550]
-------- Start Training on 1574189015.0444632
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 16.519545793533325
---- Creating Validation Set for data in range [233551, 350325]
-------- Start Training on 1574189032.130722
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 16.46050000190735
---- Creating Validation Set for data in range [350326, 467100]
-------- Start Training on 1574189049.203674
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 17.15988779067993
0
predicted
0.1319
avocado
0.1523
curr_impute
0.3132
1
predicted
0.1318
avocado
0.1493
curr_impute
0.3041
2
predicted
0.1339
avocado
0.1541
curr_impute
0.3129
3
predicted
0.1332
avocado
0.1519
curr_impute
0.3134
Model for M03 - H2AFZ
m03_result=hist_models("M03", ml_df_shuf)
---- Creating Validation Set for data in range [0, 93420]
-------- Start Training on 1574188493.600275
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 13.731431722640991
---- Creating Validation Set for data in range [93421, 186840]
-------- Start Training on 1574188507.847035
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 12.977031230926514
---- Creating Validation Set for data in range [186841, 280260]
-------- Start Training on 1574188521.262924
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 12.248181104660034
---- Creating Validation Set for data in range [280261, 373680]
-------- Start Training on 1574188533.940921
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 12.40692687034607
0
predicted
0.1064
avocado
0.1181
curr_impute
0.2546
1
predicted
0.1058
avocado
0.1173
curr_impute
0.255
2
predicted
0.1081
avocado
0.1195
curr_impute
0.2601
3
predicted
0.1075
avocado
0.1199
curr_impute
0.2613
---- Creating Validation Set for data in range [0, 93420]
-------- Start Training on 1574188889.4063501
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 12.069170951843262
---- Creating Validation Set for data in range [93421, 186840]
-------- Start Training on 1574188901.90952
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 11.77048110961914
---- Creating Validation Set for data in range [186841, 280260]
-------- Start Training on 1574188914.099287
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 11.769668817520142
---- Creating Validation Set for data in range [280261, 373680]
-------- Start Training on 1574188926.2890701
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 11.894800901412964
0
predicted
0.1067
avocado
0.1181
curr_impute
0.2546
1
predicted
0.1053
avocado
0.1173
curr_impute
0.255
2
predicted
0.1079
avocado
0.1195
curr_impute
0.2601
3
predicted
0.1077
avocado
0.1199
curr_impute
0.2613
Model for M25 - H3K79me2
m25_result=hist_models("M25", ml_df_shuf)
---- Creating Validation Set for data in range [0, 93420]
-------- Start Training on 1574188633.599432
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 13.147719860076904
---- Creating Validation Set for data in range [93421, 186840]
-------- Start Training on 1574188647.193089
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 12.824733972549438
---- Creating Validation Set for data in range [186841, 280260]
-------- Start Training on 1574188660.440157
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 12.775397062301636
---- Creating Validation Set for data in range [280261, 373680]
-------- Start Training on 1574188673.624012
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 12.997114896774292
0
predicted
0.1729
avocado
0.1953
curr_impute
0.2664
1
predicted
0.1769
avocado
0.2005
curr_impute
0.2736
2
predicted
0.1698
avocado
0.1891
curr_impute
0.2577
3
predicted
0.1713
avocado
0.192
curr_impute
0.2624
---- Creating Validation Set for data in range [0, 93420]
-------- Start Training on 1574189102.0591948
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 11.923866271972656
---- Creating Validation Set for data in range [93421, 186840]
-------- Start Training on 1574189114.4053
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 11.734225988388062
---- Creating Validation Set for data in range [186841, 280260]
-------- Start Training on 1574189126.537584
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 11.834797859191895
---- Creating Validation Set for data in range [280261, 373680]
-------- Start Training on 1574189138.7640052
/Users/Michavillson/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
-------- Finished Training, elapsed time: 13.448953866958618
0
predicted
0.1733
avocado
0.1953
curr_impute
0.2664
1
predicted
0.1786
avocado
0.2005
curr_impute
0.2736
2
predicted
0.1724
avocado
0.1891
curr_impute
0.2577
3
predicted
0.1714
avocado
0.192
curr_impute
0.2624
Convert the Above Statistics into Tidy Data Format